bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

reduce redundancy in direct RNA long-read only assembly? #71

Open mjudd8 opened 2 months ago

mjudd8 commented 2 months ago

Hello,

I used rnabloom to construct a transcriptome from direct RNA data with the following:

rnabloom -long all_reads.fastq -stranded -t 25 -outdir dir/ -u true

The rnabloom.transcripts.fa assembly file seems to have a lot of redundant transcripts with very small variations - is there a way to generate a rnabloom.transcripts.nr.fa file with just the long read data?

Thanks!

kmnip commented 2 months ago

Some settings can be changed to reduce the redundancy of the assembly, e.g.

-indel 100 -tip 100 -p 0.6

The default for these for long reads are -indel 50 -tip 50 -p 0.7. The other source for redundancy came from a bug in minimap2 not outputting some overlaps correctly.