KarchinLab / open-cravat

A modular annotation tool for genomic variants
MIT License
110 stars 27 forks source link

Transcript level annotation format #219

Open tkmamidi opened 1 year ago

tkmamidi commented 1 year ago

Hello,

Thank you for the wonderful open source tool!

I have a similar request as KarchinLab/open-cravat-modules-karchinlab#4. My aim is to use transcript specific scores and I'm writing a custom annotation parser to convert them to multiple rows but I'm facing some challenges doing it.

For examples, I'm working with this variant for transcript level annotations chr6,56900482,T,C

The problem is that different annotators use different format for variant level annotations. For example, Mutation accessor annotations are in a list of lists format

[[""ENST00000370754"", 0.157, ""Tolerated"", 0.8431568431568431, 4.32, ""Low"", 17], [""ENST00000449297"", 0.411, ""Tolerated"", 0.5894105894105894, 3.71, ""Low"", 18]]

However, FatHMM annotations are split to two columns: one with all transcript IDs and one with scores

ENST00000370754;ENST00000449297,ENSP00000359790;ENSP00000393082,-1.0;-3.85,0.95863,

Is there a way to output them in the same format? I'd really appreciate if we can output each transcript in separate rows.

Thank you!!

tkmamidi commented 1 year ago

This variant is also an edge case because there were no transcripts found in all_mappings column but there are some damage predictions from tools. Please advise!