cov-lineages / constellations

Other
44 stars 17 forks source link

Add NSP2 varint inside variants configuration file #19

Closed geocarvalho closed 3 years ago

geocarvalho commented 3 years ago

Hello guys, thanks to make all this data available. I'd like to add aa:NSP2:T85I that I saw inside cB.1.351.json to my variants configuration file for type_variant.py script, but there isn't an option (orf1ab, orf1a, orf1b, s, orf3a, e, m, orf6, orf7a, orf8, n, orf10) for that NSP2 in the type_variants.py. Could you help me with that?

rmcolq commented 3 years ago

Sure, here is how to get the coordinate: This file here is used by scorpio to translate between protein and gene coordinates. Looking at this line: https://github.com/cov-lineages/constellations/blob/46796184252652e4668314c0a4ecf4ba52c91725/constellations/data/SARS-CoV-2.json#L58 you will see that NSP2 runs from amino acid position 181 to 818 in orf1ab. The variant aa:NSP2:T85I is at amino acid position 85 in NSP2 so should be at amino acid position 85+181 in orf1ab. If you have some examples to test on, try with positions 266 and 265 (I subtract 1 within my script to handle the 1-based coordinate system, but type-variants might then subtract 1 again).

geocarvalho commented 3 years ago

Thank you @rmcolq, I'll try to test it.