BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
203 stars 69 forks source link

diffSplice: start coordinates > end coordinates #253

Open orbitalse opened 1 year ago

orbitalse commented 1 year ago

Copy and paste the exact command you tried to run python3 $FLAIR/flair.py diffSplice -i $OUT_DIR/flair.merged.isoforms.bed \ -q $OUT_DIR/flair.merged.counts.tsv \ --threads 16 \ --output $OUT_DIR/diff_splicing/flair.diff_splice

How did you install Flair? git cloned the current repository

What happened?

After running diffSplice to identify splicing events, I noticed that many of the events had start coordinates that were greater than the end coordinates. I am currently looked at exon skipping events. This did not occur for all es events, but for a good portion (more than half). I checked a couple of things, including:

(1) Whether this was due to strandedness: It does not appear to be, as this phenomena occurred for events on both the plus and minus strands.

(2) Whether this was due to an error in collapsing: I have a very large data set with many samples, and therefore followed previous advice on this GitHub to (a) merge corrected splice junctions from all samples (b) split by chromosome (c) run collapse on each chromosome separately and (d) merge collapsed reads for all chromosomes. However, the resulting output of the collapse stage, flair.merged.isoforms.bed (which I subsequently fed into the diffSplice command above as input) did not have any issue with start coordinates being greater than end coordinates. This leads me to believe the error is in fact occurring at the diffSplice stage.

(3) Whether this issue has been addressed on this GitHub previously: while there has been a similar issue with coordinates (issue #84 ), I think I am experiencing something a bit different. As I mentioned, I don't think it's a result of the collapse step b/c the isoform coordinates appeared to be fine (although please do correct me if I am mistaken). Furthermore, the start coordinate is much bigger than the end coordinate, not just by 1 nt.

(4) Whether the coordinates have the same issue in the corresponding isoform GTF file: I've checked the exons identified as es events using the flair.merged.isoforms.gtf, and they have coordinates with start coordinate < end coordinate, as expected. However, I did not want to just assume that I should swap the start and end coordinates of the splicing event without knowing whether there was an underlying issue.

Here is an example of such an event: inclusion_chr1:11970816-11970669

In the GTF, this event is annotated as follows: chr1 FLAIR exon 11970670 11970816 . + . gene_id "ENSG00000083444.16"; transcript_id "ENST00000196061.4"; exon_number "16";

Do you have an idea for why this might be happening, and how I might be able to resolve the issue? I greatly appreciate your help. Thank you in advance!