bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

Resolve Racon Conflict with Numeric Named Reads #66

Open ad3002 opened 5 months ago

ad3002 commented 5 months ago

This pull request addresses an open issue in Racon (https://github.com/isovic/racon/issues/233), where Racon encounters an error if reads and contigs have identical names. In our project, we have read files with numeric names generated by an upstream tool, leading to a naming conflict in Racon.

To resolve this, I have implemented a solution where a 'unitig' prefix is added to unitig fasta records. This change effectively prevents the name conflict in Racon, and subsequent tests confirm that RNA-Bloom now operates as expected. This update ensures compatibility and stability in RNA-Bloom, addressing the named issue without affecting other functionalities.

kmnip commented 5 months ago

Hi @ad3002 , Instead of modifying the code of RNA-Bloom, you can work around the issue by simply giving the read names a "proper" prefix (e.g. "seq"). You can do so easily with seqtk:

seqtk rename reads.fq seq > renamed_reads.fq

Ka Ming

ad3002 commented 5 months ago

Yes, I did exactly that, another possible fix is to add this possible caveat to the RNA-Bloom documentation. Because it crashes without any errors that can be linked to contig/rides matching. And without experience, it's impossible to find a solution.

kmnip commented 5 months ago

Thanks for the suggestion, I have added a note about it in the readme.