farhat-lab / gentb-site

The genTB project, the Django site, variant calling and prediciton pipeline, and mapping pipeline with hooks to two ravens
https://gentb.hms.harvard.edu
Other
8 stars 11 forks source link

Mutation names don't match REGEX #161

Closed doctormo closed 5 years ago

doctormo commented 5 years ago

We need some clarity on the last two parts of these example mutation names.

SNP_CS_3597737_C30T_V10V-TB7.3
SNP_I_25610_G34C_inter-TB39.8-Rvnt03
SNP_CN_2447127_T374C_K125R-TB16.3
SNP_I_3067961_G228A_inter-thyX-hsdS.1
SNP_I_2447539_A66G_inter-TB16.3-Rv2186c
SNP_CN_3597682_C85T_V29I-TB7.3
SNP_CN_2400031_G298A_R100C-TB18.6
SNP_CS_24437_A1008G_S336S-TB39.8
SNP_CN_2950449_G857A_R286Q-TB31.7
SNP_CN_24745_C700A_D234Y-TB39.8
SNP_CN_671644_T479C_F160S-TB27.3
SNP_CS_3396893_G249A_S83S-TB22.2
SNP_CS_24007_G1438T_R480R-TB39.8
SNP_CN_2399887_C442T_E148K-TB18.6
SNP_CN_1842549_G99T_K33N-TB15.3
SNP_CN_3068398_G67C_P23A-hsdS.1
SNP_CN_3597683_G84T_D28E-TB7.3
SNP_CS_25145_C300G_T100T-TB39.8
SNP_CN_3396588_T554C_H185R-TB22.2
SNP_I_1842298_G153C_inter-Rv1635c-TB15.3
SNP_CN_3585759_A191G_V64A-TB9.4
SNP_CN_24670_G775C_Q259E-TB39.8
SNP_CN_2447282_T219G_E73D-TB16.3
SNP_CN_194030_G405A_M135I-TB18.5
SNP_I_1305531_C138G_inter-fbiC-TB8.4
SNP_CN_3068305_C160T_V54I-hsdS.1
SNP_CS_24509_C936T_Q312Q-TB39.8
SNP_I_2447539_A66G_inter-TB16.3-Rv2186c
SNP_CS_24821_G624A_G208G-TB39.8
SNP_CN_3788453_C86T_T29I-echA18.1
SNP_CS_193874_G249A_V83V-TB18.5
SNP_CN_3068286_C179T_G60D-hsdS.1
SNP_CN_2400031_G298A_R100C-TB18.6
SNP_CN_3597682_C85T_V29I-TB7.3
SNP_I_2949476_C117A_inter-Rv2622-TB31.7
SNP_I_23854_A7G_inter-Rv0019c-TB39.8
SNP_CN_1842494_A44C_D15A-TB15.3
SNP_CS_2447390_T111C_E37E-TB16.3
SNP_CS_24407_C1038T_P346P-TB39.8
SNP_CN_24340_C1105G_D369H-TB39.8
mahafarhat commented 5 years ago

ok for example:

SNP_CS_3597737_C30T_V10V-TB7.3

nucleotide change C30T

AA change V10V

locus TB7.3

(I think all of the mutations in the above format, are legacy format that we should no longer see)

for mutations with "inter" the locus is the following (you can simply split on '_' here) inter-thyX-hsdS.1

On Tue, Jun 25, 2019 at 3:30 PM Martin Owens notifications@github.com wrote:

Assigned #161 https://github.com/farhat-lab/gentb-site/issues/161 to @mahafarhat https://github.com/mahafarhat.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/farhat-lab/gentb-site/issues/161?email_source=notifications&email_token=AB6BAVKBPJU47XDZJ6OJYLTP4JW3RA5CNFSM4H3LQPF2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSFNWE5Q#event-2438685302, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6BAVOUI4GHP3KXDXVBANDP4JW3RANCNFSM4H3LQPFQ .

mahafarhat commented 5 years ago

note with some testing now I'm finding that some genes are only represented by Rv numbers when in fact they should have an associated symbol for example I only see Rv1908c when it should be labeled katG

mahafarhat commented 5 years ago

this issue is still pending