BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
201 stars 69 forks source link

Question regarding sequencing depth and collapse step #303

Closed wuy24 closed 6 months ago

wuy24 commented 9 months ago

Dear Flair developer,

I am using Flair for our transcriptome analysis and the data we got are sequenced on Promethion 24 with one flowcell. For this run, we get about 80million reads. After I run flaire align, correct and collapse, I got the isoforms.fa and isoforms.gtf files. When I use SQANTI to do the isoform classification, we found only 22% of the isoforms are full-splicing-match, 2.7% are incomplete splicing match, more than 70% are novel in category. However, when we have shallow sequencing 5 million reads, the full-splicing-match isoform are as high as 60%. I wonder if sequencing depth will affect the isoform discovery. And whether there will be potential issue in collapse step if the sequnecing depth is too high. Thank you very much!

image

Best, Ying

Jeltje commented 6 months ago

With higher coverage you get a higher number of artefacts. Flair collapse has a minimum coverage requirement of 3 reads, which is very low for high coverage transcriptomes. You can use the --support flag to change this.

With human samples is generally a good idea to use --annotation_reliant so Flair only finds isoforms and does not try to annotate novel genes.