bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

We really need to keep E0.L. sequence? #41

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

I found many E0.L. sequences (2 to 3 or more times than other E sequences) in rna-bloom's assembly (in all samples I have tested), but the busco results with or withou E0.L. sequences are almost the same. Is it necessary to keep E0.L. sequences?

Or is there an option to tell rna-bloom not to output E0.L. sequences?

Best, Kun

kmnip commented 2 years ago

The E0 and 01 sequences are low expression transcripts in your RNA-seq samples. If you set -stratum e1, then you should see a lot less of E0 and 01 sequences. They won't be removed entirely though.

As far as I know, BUSCO contains conserved eukaryotic genes. So, there is a good chance that BUSCO might not pick up every single gene in your species of interest. Another possible situation is that these E0 transcripts have higher expression isoforms, which may have already been picked up by BUSCO. So, removing the E0 transcripts would have no effect on your BUSCO results.

xiekunwhy commented 2 years ago

-stratum e1 help me a lot, with this option, the transcript number of all sample I have tested (40+ pooled RNA-seq samples from difference species) were near the expected number.

kmnip commented 2 years ago

Thanks Kun for your report. I am that it worked for you! :)