Open iqbal-lab opened 2 years ago
For context. We have this example
"Isoniazid": {
"predict": "R",
"called_by": {
"katG_GC1037G-GC2155074C": {
"variant": null,
"genotype": [
1,
1
],
"genotype_likelihoods": [
-711.0227088947968,
-417.9483717274669
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 7,
"min_non_zero_depth": 6,
"kmer_count": 175,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 15,
"min_non_zero_depth": 13,
"kmer_count": 253,
"klen": 18
}
},
"expected_depths": [
24
],
"contamination_depths": [],
"filter": [],
"conf": 293
},
"_cls": "Call.VariantCall"
},
"katG_CC1038C-GG2155073G": {
"variant": null,
"genotype": [
1,
1
],
"genotype_likelihoods": [
-727.5071540961122,
-377.76816030583126
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 7,
"min_non_zero_depth": 6,
"kmer_count": 160,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 15,
"min_non_zero_depth": 13,
"kmer_count": 253,
"klen": 18
}
},
"expected_depths": [
24
],
"contamination_depths": [],
"filter": [],
"conf": 350
},
"_cls": "Call.VariantCall"
},
"katG_CC1039C-GG2155072G": {
"variant": null,
"genotype": [
1,
1
],
"genotype_likelihoods": [
-729.808268296947,
-372.51398695693916
],
"info": {
"coverage": {
"reference": {
"percent_coverage": 100.0,
"median_depth": 7,
"min_non_zero_depth": 6,
"kmer_count": 158,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 15,
"min_non_zero_depth": 13,
"kmer_count": 253,
"klen": 18
}
},
"expected_depths": [
24
],
"contamination_depths": [],
"filter": [],
"conf": 357
},
"_cls": "Call.VariantCall"
}
}
},
The three mutations in the JSON above could concieveably be the same deletion...or a 2bp deletion I guess.
GC1037G
CC1038C
CC1039C
I guess it probably is a single 1bp deletion - at 1038.
Although I'm a bit confused about how the mutations work. Which of the two bases is the one at the position?
If I pull out these three bases, with one base flanking, from the katG reference, I get G GCC T. So if the position describes the first base, then 1039 should read CT1039C right? But if the second base describes the position, it's the same problem right?
The sequence at 1038-1040 in katG is CCC.
Hmmm, wonder why I got a different sequence.
The probe generator is making one probe for each 1bp and 2bp deletions in katG and pncA. But if you make a all 1bp deletions in a homopolymer, they are essentially the same. eg deleting one A from AAA could happen in 3 places and give the same sequence. This leads to annoying things where we report >1 mutation being detected, when it is basicvally the same thing detected twice/whatever.
Workaround suggestions