blahah / transrate

Understand your transcriptome assembly
http://hibberdlab.com/transrate
Other
100 stars 34 forks source link

Discrepancy in ORF count between contigs.csv and assemblies.csv #239

Closed zagorGit closed 4 years ago

zagorGit commented 4 years ago

We noticed discrepancy between n_with_orf from assemblies.csv file and number of sequences from contigs.csv file where values in orf_length exist, or are > 50nt or even > 50aa. How is n_with_orf actually calculated?

blahah commented 4 years ago

Hey, yeah we set an arbitrary limit of 50aa in order to avoid spurious false tiny ORFs causing issues, I think.

However, in hindsight that was a spurious heuristic to apply. We could change it to a CLI option, and you can change the code here to adjust it for your own use case: https://github.com/blahah/transrate/blob/91fb81a89f9fd73d28de2ad34074ec193c99b41b/lib/transrate/assembly.rb#L162-L164

zagorGit commented 4 years ago

great, thnx!

I suggest to rename _'n_withorf' then to _n_withorf > 149 nt key name, or similar, not to confuse users and reviewers