Closed FabianDK closed 2 years ago
It remains a valid question.
In practice, we often map RNA-Seq data against the softmasked genome, using aligners that ignore soft masking. This generates evidence for introns in softmasked regions. Generally, this is not a problem. For example, AUGUSTUS will usually not initiate a gene structure in a fully repeat masked region. It may initiate in a neighboring unmasked region and extend into the masked region using the evidence, and that's ok.
This answer does explicitly not apply to the integration of long read data or assembled short read data with TSEBRA.
Dear authors,
I am trying to use Braker on RNA-seq data, and I have a question about your recommendation to use a softmasked genome.
In your tutorial for Augustus (https://github.com/Gaius-Augustus/Augustus/blob/master/docs/tutorial2018/index.html) you mapped RNA-seq reads with STAR against the hardmasked genome, but then used the softmasked reference version for Augustus and Braker.
Should the same be done when using Braker2? If yes, can you please explain what the advantage and reason is doing it this way over only using the softmasked reference throughout (i.e. mapping with STAR + BRAKER)?
I am using RepeatModeler2 to identify repeats, RepeatMasker for masking, and STAR to align paired-end RNA-seq reads.
Many thanks, Daniel