Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
355 stars 79 forks source link

BRAKER manual confusion about --UTR and --addUTR #370

Open krabapple opened 3 years ago

krabapple commented 3 years ago

I'm copying the relevant manual text below for ease of reference. Comments describing my confusion are added in bold in square brackets

//

--UTR=on

Generate UTR training examples for AUGUSTUS from RNA-Seq coverage information, train AUGUSTUS UTR parameters and predict genes with AUGUSTUS and UTRs, including coverage information for RNA-Seq as evidence. This flag only works if --softmasking is also enabled. This is an experimental feature! [I understand this to mean, --UTR is for predicting UTRs as part of a standard BRAKER run]

[However, this next line seems to be about adding UTRs to an existing output -- a function I thought --addUTR was for. And indeed it shows the --addUTR option] If you performed a BRAKER run without --UTR=on, you can add UTR parameter training and gene prediction with UTR parameters (and only RNA-Seq hints) with the following command:


braker.pl --genome=../genome.fa --addUTR=on --softmasking \
    --bam=../RNAseq.bam --workingdir=$wd \
    --AUGUSTUS_hints_preds=augustus.hints.gtf \
    --cores=8 --skipAllTraining --species=somespecies

[The command above is identical to the --addUTR command shown below]

Modify augustus.hints.gtf to point to the AUGUSTUS predictions with hints from previous BRAKER run; modify flaning_DNA value to the flanking region from the log file of your previous BRAKER run; modify some_new_working_directory to the location where BRAKER should store results of the additional BRAKER run; modify somespecies to the species name used in your previous BRAKER run. ['flaning_DNA' is a typo for --flanking_DNA but in any case I do not see the option included in either the command for --UTR above or --addUTR below]

--addUTR=on

Add UTRs from RNA-Seq converage information to AUGUSTUS gene predictions using GUSHR. No training of UTR parameters and no gene prediction with UTR parameters is performed.

If you performed a BRAKER run without --addUTR=on, you can add UTRs results of a previous BRAKER run with the following command:


braker.pl --genome=../genome.fa --addUTR=on --softmasking \
    --bam=../RNAseq.bam --workingdir=$wd \
    --AUGUSTUS_hints_preds=augustus.hints.gtf --cores=8 \
    --skipAllTraining --species=somespecies

[this command is identical to the one given for --UTR above]

Modify augustus.hints.gtf to point to the AUGUSTUS predictions with hints from previous BRAKER run; modify some_new_workingdirectory to the location where BRAKER should store results of the additional BRAKER run; this run will not modify AUGUSTUS parameters. We recommend that you specify the original species of the original run with --species=somespecies. Otherwise, BRAKER will create an unneeded species parameters directory Sp*. [is it crucial to use the original species of the original run, or does it not really matter, if space for the 'unneeded directory Sp_' is not an issue?] //

[In sum I am confused as to when to use --UTR versus --addUTR, what is the proper command option set for each, which one requires --flanking_DNA, and when the original species is required]

tomasbruna commented 3 years ago

I agree that this section is a bit confusing, it seems to be a mixture of the old and new approaches to UTRs. @KatharinaHoff, can you take a look?