arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
319 stars 120 forks source link

refactoring variant impact selection (support SO and HGVS). #468

Open brentp opened 9 years ago

brentp commented 9 years ago

e.g. with:

11 67257556
rs2276020 C
T 4487.16
PASS "AC=1;AF=0.038;AN=26;BaseQRankSum=-1.646e+00;ClippingRankSum=-3.510e-01;DB;DP=1531;FS=0.790;GQ_MEAN=432.85;GQ_STDDEV=1229.37;InbreedingCoeff=-0.0570;MLEAC=1;MLEAF=0.038;MQ=60.00;MQ0=0;MQRankSum=0.567;NCC=1;POSITIVE_TRAIN_SITE;QD=12.46;ReadPosRankSum=1.23;VQSLOD=1.74;culprit=DP;;CSQ=downstream_gene_variant|||ENSG00000110711|AIP|ENST00000529797|||||retained_intron||Transcript|-/801|||407|1|||||||||||,downstream_gene_variant|||ENSG00000110697|PITPNM1|ENST00000525568|||||processed_transcript||Transcript|-/379|||4211|-1|||||||||||,downstream_gene_variant|||ENSG00000110697|PITPNM1|ENST00000436757||||-/1243|protein_coding||Transcript|-/4216|-/3732||1683|-1|||||||||||,downstream_gene_variant|||ENSG00000110697|PITPNM1|ENST00000356404||||-/1244|protein_coding|YES|Transcript|-/4225|-/3735||1684|-1|||||||||||,synonymous_variant|gaC/gaT|D|ENSG00000110711|AIP|ENST00000528641|3/5|||109/232|protein_coding||Transcript|445/815|327/697|||1|||||||||||,downstream_gene_variant|||ENSG00000110697|PITPNM1|ENST00000534749||||-/1244|protein_coding||Transcript|-/4189|-/3735||1683|-1|||||||||||,downstream_gene_variant|||ENSG00000110697|PITPNM1|ENST00000526450|||||processed_transcript||Transcript|-/750|||3359|-1|||||||||||,downstream_gene_variant|||ENSG00000110697|PITPNM1|ENST00000527370|||||retained_intron||Transcript|-/3712|||1684|-1|||||||||||,synonymous_variant|gaC/gaT|D|ENSG00000110711|AIP|ENST00000279146|4/6|||172/330|protein_coding|YES|Transcript|634/1221|516/993|||1|||||||||||,synonymous_variant|gaC/gaT|D|ENSG00000110711|AIP|ENST00000525341|2/3|||56/188|protein_coding||Transcript|168/635|168/567|||1|||||||||||"
GT:AD:DP:GQ:PL "0/0:2,0:2:6:0,6,49" "0/0:81,0:81:99:0,112,1800" "0/0:117,0:117:99:0,120,1800"
"0/0:108,0:108:99:0,119,1800"
"0/1:182,178:360:99:4522,0,4673"
"0/0:125,0:125:99:0,120,1800"
"0/0:150,0:150:99:0,120,1800"
"0/0:178,0:178:99:0,120,1800"
"0/0:128,0:128:99:0,120,1800"
"0/0:89,0:89:74:0,74,1800"
"0/0:160,0:160:99:0,120,1800"
"./.:0,0:0" "0/0:31,0:31:68:0,68,889"
"0/0:2,0:2:6:0,6,61" 

the upstream variant in PITPNM1 is chosen. This can be fixed, but we need a general solution to handle this more intelligently.

this is related to updating the logic to handle SO and HGVS.

brentp commented 9 years ago

see also #467