Closed yycc9897 closed 5 months ago
Flair tries to find different isoforms of a gene, which means that it tends to have trouble with single exon transcripts that do not overlap exons of known genes.
One way to deal with this is to use --annotation_reliant
. This restricts Flair to only genes present in the input gtf.
Another method is increasing the minimum number of supporting reads, --support
. You have a lot of input files, which indicates there are many reads. The default setting is 3 reads per isoform, try increasing that to 10.
Lastly you could just filter out all single exon genes. Flair isn't really meant to find novel single exon genes; these transcripts are just reported to avoid losing information.
If this doesn't work, please comment again. Otherwise please close this ticket. Thanks for using Flair!
Adding: The reported transcripts that overlap the leftmost exon of your multiexon transcripts are likely on the opposite strand.
Copy and paste the exact command you tried to run
How did you install Flair?
What happened?![foxo1](https://github.com/BrooksLabUCSC/flair/assets/115764048/62f1a4f5-61fd-4980-ad4c-79257a20325f)
After I ran the flair collapse step, I checked the bed file in igv and found that there were many truncated, non-full-length transcripts. What is the reason for this? I've tried --stringent and --filter nosubset, but the problem doesn't resolve. I first identified full-length transcripts using pychopper and then ran flair.
What else do we need to know? Splice sites were extracted from short-read data using star