bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
85 stars 7 forks source link

Resolve Racon Conflict with Numeric Named Reads #66

Open ad3002 opened 5 months ago

ad3002 commented 5 months ago

This pull request addresses an open issue in Racon (, where Racon encounters an error if reads and contigs have identical names. In our project, we have read files with numeric names generated by an upstream tool, leading to a naming conflict in Racon.

To resolve this, I have implemented a solution where a 'unitig' prefix is added to unitig fasta records. This change effectively prevents the name conflict in Racon, and subsequent tests confirm that RNA-Bloom now operates as expected. This update ensures compatibility and stability in RNA-Bloom, addressing the named issue without affecting other functionalities.

kmnip commented 5 months ago

Hi @ad3002 , Instead of modifying the code of RNA-Bloom, you can work around the issue by simply giving the read names a "proper" prefix (e.g. "seq"). You can do so easily with seqtk:

seqtk rename reads.fq seq > renamed_reads.fq

Ka Ming

ad3002 commented 5 months ago

Yes, I did exactly that, another possible fix is to add this possible caveat to the RNA-Bloom documentation. Because it crashes without any errors that can be linked to contig/rides matching. And without experience, it's impossible to find a solution.

kmnip commented 5 months ago

Thanks for the suggestion, I have added a note about it in the readme.