Closed schorlton closed 1 year ago
Hello @schorlton ,
Thanks for your interest in RNA-Bloom!
Yes, I am aware of this issue. Basically, these are singleton sequences that don't have any other reads assigned to it. In the next release, I will fix it so that it reports 0.0
instead of null
. I am actually contemplating about removing these sequences altogether, depending on the threshold set for the -c
option...
Thanks, Ka Ming
Thanks for your quick response!
Would the coverage not be 1
if a single read led to this transcript? It would be odd to have coverage of 0...unless I misunderstand coverage in this context.
Ideally speaking, you are correct; singleton sequences should have c=1. There is an issue with minimap2 where sequences with only high coverage minimizers in the reference are skipped. In this case, a sequence can have no reads mapped to it. There is an option in minimap2 that could be configured to potentially fix this issue. I am testing it now and see how it affects runtime and memory.
I see. I don't know RNA-Bloom well enough, but yes - minimap2 is not designed (by default) for short sequences with high frequency minimizers. The -e
flag may be what you're referring to, as it can recover more high frequency minimizers with fewer minimizer-free gaps. If the minimum transcript length is 200bp, you may need a really low -e
to get 3 minimizers/transcript to form a chain...
Hi, Same issue here.
This will be fixed in the next release.
This bug is fixed. Please see my new release of RNA-Bloom v2.0.0: https://github.com/bcgsc/RNA-Bloom/releases/tag/v2.0.0
Thanks for the amazing software! Ran some cDNA long-read sequencing through assembly:
rnabloom -long input.fastq -ntcard -t 8 -outdir assembled
It all succeeded; however, when looking at my output data, I see that
c=null
in some of the FASTA sequence headers. I understood that thec=
tag was meant to include coverage, and most of theses tags look correct!Example:
Please report