Open davmlaw opened 2 months ago
Hi, I've made an initial implementation of the biocommons HGVS TARK loader - review/comments would be very helpful!
I check the TARK sequence and compare it to the sequence from pasting together genome exomes, if different, I say we don't support that transcript / genome alignment so we at least don't get it wrong
Hi, this project looks good! Thanks!
I would like to use Tark as a source of transcripts for Biocommons HGVS Python library
RefSeq transcripts can differ from the genome sequence, so can align to the genome build with indels
For instance NM_001205122.2 (ATG13) aligning to GRCh38 has a 2bp deletion in exon 15 (alignment is 509bp match, 2 bp deletion, 1753bp match).This is critical to know when converting between genomic (g.) and c. HGVS so you can adjust for these gaps
I have already done so in my own project -cdot - which reads RefSeq/Ensembl GFF/GTF files, ideally I would like to stop maintaining this myself and move over to Tark
Eg: https://cdot.cc/transcript/NM_001205122.2 has this alignment info (in Biocommons HGVS style)
As far as I can see, Tark doesn't have this yet:
https://tark.ensembl.org/api/transcript/?stable_id=NM_001205122&stable_id_version=2&expand_all=true
Could you please add these alignment strings to RefSeq transcript exons? Knowing mismatches would also be beneficial
I hope to write a JSON client for HGVS, that will only be enabled for Ensembl to start with. Thanks!