BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
211 stars 71 forks source link

Annotate 5'UTR, CDS, 3UTR in FLAIR isoform #130

Open akk01 opened 4 years ago

akk01 commented 4 years ago

Do you recommend any tool to annotate 5'UTR, CDS, 3UTR features in the isoforms generated from FLAIR

belgravia commented 4 years ago

There are many ORF-finding algorithms out there. I haven't ran too many of them myself, but there's SQANTI and orffinder among many others. If you have trouble with flair-collapse isoform output gtf/fa/bed being incompatible with these tools, let me know.

FLAIR also has its own ORF-finding script -- predictProductivity.py. You give it your isoforms.bed file and a gtf containing annotated start_codons. The script looks at annotated start codons within each isoform and predicts ORFs s.t. the longest ORF is made or just uses the most upstream start codon depending on the user's specification. It's documented in the readme and this might be the easiest next step, if you are satisfied with the algorithm. Something I added somewhat recently is the capability for the flair script bin/psl_to_gtf.py to convert the bed output from predictProductivity to a gtf that contains 5'UTR, CDS, and 3'UTR entries for each transcript.

-Alison

obegik commented 3 years ago

Hi Alison,

This is going to repeat the same question with another focus. Can we use FLAIR to find alternative 3'UTR isoforms ? i.e. Short and Long UTR coordinates?

Thanks, Oguzhan

belgravia commented 2 years ago

I don't have any scripts to do this in FLAIR. But if you ran predict productivity or another equivalent that gives you where the start/stop codon are in a flair transcript, then you could gather all your transcripts according to stop codon coordinate and record where the transcription end site is (all of this via a python dictionary for example), then see which of your stop codon coordinates correspond to multiple TESs. If you have multiple conditions, you could do a fisher's test for example to see if one of the TESs is significantly associated with samples in one condition vs the other condition. This could be more complicated if you also wanted to look for alternative 3' UTRs in a gene that have different stop codons and TESs, because the way I described it 'anchors' on a stop codon. Hope this makes sense. -A

MustafaElshani commented 2 years ago

I think 3' UTR features and alternative polyadenylation usage would be a great addition to flair LAPA is trying to do something similar, I just wonder if flair is already better 'equipped' and maybe "just" needs a new script to do DE expression of APA sites usage

Jeltje commented 2 years ago

We don't currently have the bandwidth for this so I'm marking this a feature request. If you want to help with Flair development, please feel free to fork the repo and add this functionality yourself.