BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

merging of transcript ids and gene ids is frustrating to use #284

Closed diekhans closed 10 months ago

diekhans commented 1 year ago

The convention of merging the transcript id and gene id into transId_geneId is time-wasting to use in analysis pipelines.

When the GTF is written out, the transcript_id is set to just the trans_id part. So then if you run other tools on the GTF, and want to access the flair files, you need to do id mapping from transId_geneId.

Plus using "_" conflicts with RefSeq accessions 🤯

Please don't encoding of metadata in ids, this has long been known to cause all kinds of grief in bioinformatics (TCGA). Please use two columns, one for transcript id and one for gene id in files. for BEDs, you can make it a BED12+1 if you want (nice for turning into a bigBed track.

diekhans commented 1 year ago

Some of the utility programs (e.g. diff_isoform_usage) write gene id and transcript id as separate; which is great, but then different than other programs, requiring even more programming.

diekhans commented 1 year ago

Also, transcript ids composed of two UUIDs make it harder to display the transcripts (e.g. in ucsc browser) and seems pointless.