BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

inconsistency of isoform_ids formatting of flair diffSplice regarding "es" relative to "alt3", "alt5", and "ir" #223

Closed yjx1217 closed 1 year ago

yjx1217 commented 2 years ago

Dear Developers,

Thanks for developing Flair, which is very useful for long-read-based transcriptome sequencing data analysis. While trying Flair for my own analysis, I noticed the following issue:

I ran "flair diffSplice" based on flair's quantification outputs, which work fine without any error. However, when examining the contents of the resulting files: $prefix.alt3.events.quant.tsv, $prefix.alt5.events.quant.tsv, $prefix.ir.events.quant.tsv, and $prefix.es.events.quant.tsv, I noticed that in $prefix.es.events.quant.tsv, the entire ids (i.e., the concatenated form of IsoformID_GeneID, e.g. fe9fef1f-523d-4eb3-a0ee-eb0531171b26-0_ENSG00000137221) of the *.counts_matrix.tsv file was used for the isoform_ids column, while for the other three events.quant.tsv outputs, only IsoformID (e.g. fe9fef1f-523d-4eb3-a0ee-eb0531171b26) was used. So I think there might be a minor bug here regarding the ID parsing and outputting consistency.

Best, Jia-Xing

Jeltje commented 1 year ago

Thanks for pointing this out. It is fixed in commit d196702 and will be part of releases after 1.6.4