BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
203 stars 69 forks source link

Identification of full-length intergenic isoform #264

Closed Dongxu-Zheng closed 5 months ago

Dongxu-Zheng commented 1 year ago

I have read the paper (https://doi.org/10.1038/s41467-020-15171-6)
and the manual (https://flair.readthedocs.io/en/latest/) and I still have a question about

Hi Thanks for developing this great tool for long-read sequencing data analysis!

I am working on Pacbio Iso-seq data and ONT direct RNA-seq data. I used Isoseq3 and SQANTI3 for Iso-seq data analysis and identified some intergenic isoforms. Then I used FLAIR to assemble transcriptome based on my ONT data. I still have some isoforms classified as intergenic after running SQANTI3 for isoform identification and QC. However, all of the intergenic isoforms in the transcriptome derived from FLAIR are considered artifacts and have only one exon. I also noticed some users have reported that they also have a lot of intergenic isoforms with only one exon. I was wondering how to interpret this result. Is this case caused by the technical difference between Pacbio and ONT or some strategies between Isoseq3 and FLAIR? Or maybe I need to do something special for SJs?

I further looked into a locus of intergenic isoform identified from Isoseq data. Some exons of this intergenic isoform can be detected in ONT data. But in FLAIR transcriptome, these three exons are considered three intergenic isoforms with one exon. SQANTI3 thinks they are artifacts. Thanks for any comments, thoughts, and suggestions in advance!

image

Jeltje commented 10 months ago

FLAIR classifies these as isoforms because they overlap exons. If you use the --stringent option with flair collapse they should disappear.