ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
151 stars 13 forks source link

Starting with a soft-maksed reference genome? #59

Closed fangbohao closed 1 year ago

fangbohao commented 1 year ago

Hi, I was wondering if I should run IsoQuant with a repeats-soft-masked reference genome or just an unmasked genome. Which one would be better?

Thanks! Bohao

fangbohao commented 1 year ago

Also, could I concatenate FASTQ files from different tissues first, and then run isoQuant?

Thanks!

andrewprzh commented 1 year ago

Dear @fangbohao

IsoQuant does not differentiate between lower and upper case, and so does not minimap2, which is used as the default aligner (https://github.com/lh3/minimap2/issues/654). I've always used unmasked genomes. I don't think repeats strongly affect mapping of RNA reads.

Yes, using all reads combined could be beneficial for discovery of low-expressed isoforms. You don't need to concatenate them, you can simply provide --fastq_list file.txt, were file.txt contains list of all FASTQs without blank lines. There couple examples in the manual.

Best Andrey

fangbohao commented 1 year ago

Hi Andrey, thank you for your suggestions!

fangbohao commented 1 year ago

Hi Andrey, a follow-up question: Once getting the resulting transcript, do you have any recommended programs/pipelines to conduct gene structure annotation?

Thanks!

andrewprzh commented 1 year ago

I don't have much experience with further analysis. You may try looking into IsoAnnotLite and tappaS. They may require running SQANTI3. As far as I know current SQANTI version supports GTFs produced by other tools. Next version of IsoQuant will also provide novel transcript classification in SQANTI-like format.

Best Andrey

fangbohao commented 1 year ago

Thank you, Andrey!