BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
BRAKER manual confusion about --UTR and --addUTR #370

Open krabapple opened 3 years ago

krabapple commented 3 years ago

I'm copying the relevant manual text below for ease of reference. Comments describing my confusion are added in bold in square brackets



Generate UTR training examples for AUGUSTUS from RNA-Seq coverage information, train AUGUSTUS UTR parameters and predict genes with AUGUSTUS and UTRs, including coverage information for RNA-Seq as evidence. This flag only works if --softmasking is also enabled. This is an experimental feature! [I understand this to mean, --UTR is for predicting UTRs as part of a standard BRAKER run]

[However, this next line seems to be about adding UTRs to an existing output -- a function I thought --addUTR was for. And indeed it shows the --addUTR option] If you performed a BRAKER run without --UTR=on, you can add UTR parameter training and gene prediction with UTR parameters (and only RNA-Seq hints) with the following command: --genome=../genome.fa --addUTR=on --softmasking \
    --bam=../RNAseq.bam --workingdir=$wd \
    --AUGUSTUS_hints_preds=augustus.hints.gtf \
    --cores=8 --skipAllTraining --species=somespecies

[The command above is identical to the --addUTR command shown below]

Modify augustus.hints.gtf to point to the AUGUSTUS predictions with hints from previous BRAKER run; modify flaning_DNA value to the flanking region from the log file of your previous BRAKER run; modify some_new_working_directory to the location where BRAKER should store results of the additional BRAKER run; modify somespecies to the species name used in your previous BRAKER run. ['flaning_DNA' is a typo for --flanking_DNA but in any case I do not see the option included in either the command for --UTR above or --addUTR below]


Add UTRs from RNA-Seq converage information to AUGUSTUS gene predictions using GUSHR. No training of UTR parameters and no gene prediction with UTR parameters is performed.

If you performed a BRAKER run without --addUTR=on, you can add UTRs results of a previous BRAKER run with the following command: --genome=../genome.fa --addUTR=on --softmasking \
    --bam=../RNAseq.bam --workingdir=$wd \
    --AUGUSTUS_hints_preds=augustus.hints.gtf --cores=8 \
    --skipAllTraining --species=somespecies

[this command is identical to the one given for --UTR above]

Modify augustus.hints.gtf to point to the AUGUSTUS predictions with hints from previous BRAKER run; modify some_new_workingdirectory to the location where BRAKER should store results of the additional BRAKER run; this run will not modify AUGUSTUS parameters. We recommend that you specify the original species of the original run with --species=somespecies. Otherwise, BRAKER will create an unneeded species parameters directory Sp*. [is it crucial to use the original species of the original run, or does it not really matter, if space for the 'unneeded directory Sp_' is not an issue?] //

[In sum I am confused as to when to use --UTR versus --addUTR, what is the proper command option set for each, which one requires --flanking_DNA, and when the original species is required]

tomasbruna commented 3 years ago

I agree that this section is a bit confusing, it seems to be a mixture of the old and new approaches to UTRs. @KatharinaHoff, can you take a look?