Improving Sensitivity for Similar Isoform Assembly and Handling PacBio HiFi RNA Reads Without Polishing

zhenyu7500 commented 2 weeks ago

Hi,

Thanks for developing RNA-Bloom, it is a great software. I have the following two questions and look forward to your response. Thank you!

Ⅰ. I want to assemble isoforms that have many similar repeats in the genome. I have confirmed that the similar repeats of the genes of interest are expressed, but I want to assemble the repeats as much as possible. How can I improve the sensitivity?

Ⅱ. The long reads we have are PacBio HiFi long reads, so I do not want to polish my reads by reads-to-reads alignment. Is the option "--errcorritr 0" used to achieve this?

java -version openjdk version "11.0.13" 2021-10-19 OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21) OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)

rnabloom -stranded -lrpb --errcorritr 0 -sensitive -long my_input.fasta -t 20 -outdir my_isoform

Thank you！

Dexiang Hu

kmnip commented 2 weeks ago

Hello,

Yes, you can set --errcorritr 0 to turn off the initial kmer-based polishing. The --sensitive option is meant for short-read assembly and it doesn't do anything for long-read data.

To increase sensitivity, you may also add these options -p 0.98 -indel 1 to further increase percent identity and decrease indel size threshold between sequences. However, your assembly may ending up having a lot of redundant isoforms.

zhenyu7500 commented 2 weeks ago

Thank you for your response!

I would like to know which reads were used to assemble the isoforms in the file "rnabloom.transcripts.fa." Do I need to remap the reads to the rnabloom.transcripts.fa file?

Thanks!

kmnip commented 2 weeks ago

Yes, please re-map your reads to assembled transcripts.

bcgsc / RNA-Bloom

Improving Sensitivity for Similar Isoform Assembly and Handling PacBio HiFi RNA Reads Without Polishing #74