Phelimb / atlas

atlas
MIT License
5 stars 4 forks source link

Issue with database or genotype calibration #23

Open iqbal-lab opened 7 years ago

iqbal-lab commented 7 years ago

Ron has a simulation where all of his false Atlas genotype calls have high genotype confidence

Results at 10x 10x

iqbal-lab commented 7 years ago

Note this is normalised - so there may be hundreds of correct calls and only 1 false call

Phelimb commented 7 years ago

Thanks. Pretty strange. @ronald-jaepel any chance could you point me at the data or command lines run? Would make life much easier if I could have a look at the raw data too.

iqbal-lab commented 7 years ago

This turns out to be just 2 false calls. But odd

iqbal-lab commented 7 years ago
del
iqbal-lab commented 7 years ago

Hey @Phelimb , I think we need to know from @ronald-jaepel the precise 2 errors at 10x, and 5 at 5x

ronald-jaepel commented 7 years ago

ok, the used genome is at /data1/projects/ronald_jaepel/atlas_test/Simulation_Products/generated_genomes/2016_07_26_1353/ecoli_K12_MG1655_ref_all_families.fa the reads are at /data1/projects/ronald_jaepel/atlas_test/Simulation_Products/simulated_reads/2016_07_26_1353/ cortex graphs are at /data1/projects/ronald_jaepel/atlas_test/Simulation_Products/simulated_graphs/2016_07_26_1353/ atlas JSONS are at /data1/projects/ronald_jaepel/atlas_test/JSONS/2016_07_26_1353/ *walk3.txt

the number behind ecoli_allfamilies is the number of reads. ecoli_all_families_33000.k31.ctx e.g. is the cortex graph for 10x coverage. ecoli_all_families_16000.k31.ctx is 5x, ecoli_all_families_10000.k31.ctx is 3x.

ronald-jaepel commented 7 years ago

This is the list of the genes that should be in each sample: ["aac6I-Iai","kfiC-1","afaE1-SaT040","SIM-1","SPU-1","mcr-1","SPM-1","sfaSecol-536","aac6IIb-a","mph-A","P-3","GIM-1","FIM-1","cml-1","cmx-1","LAT-1","LAP-1","MAL-1","NPS-1","BEL-1","CMH-1","CMG-1","rmtB-1","CME-1","rmtD-1","CMY-24","Pc/Pd-1","BES-1","tet-44","entA-CFT073","entC-CFT073","entB-CFT073","entE-CFT073","folP-1","entF-CFT073","sul-1","VanTr-L","cfxA-1","NMC-A","mef-A","KPC-11","uspA-CFT073","IND-11","aph2II-IIIa","SME-1","aac2I-Ia","SMB-1","armA-1","RAHN-1","Pa/Pb-1","msr-A","CSP-1","aac6I30aac6I-IbI","aph6-Ia","aph4-Ia","malX-CFT073","pic-CFT073","aph9-Ia","focH-CFT073","VEB-1","focA-CFT073","focC-CFT073","focD-CFT073","focF-CFT073","focG-CFT073","cmlA-1","cmlB-1","cat-1","ere-A","cmlV-1","erm-42","ERP-1","hugA-1","IBC-1","yfcV-CFT073","feoB-CFT073","CblA-1","ampCpromoter-1","ant3IIIIaac6I-IId","CFE-1","F17likefimbrialChaperoneecol-536","FAR-1","ireA-APEC01","lsa-A","nimA-1","nimB-1","nimC-1","nimD-1","penA-1","MIR-11","ompT-CFT073","tsh-CFT073","ompF-1","ompA-NMEC58C","ompC-1","ompK-36","sfaEecol-536","L-1","traJ-VF0241","traT-pAPECO2ColV","F17likefimbrialsubuniTecol-536","VIM-42","SFO-1","SFH-1","SFB-1","SFC-1","OCH-3","hlyC-CFT073","hlyB-CFT073","hlyA-CFT073","mdf-A","hlyF-pETN48","hlyD-CFT073","fepC-CFT073","fepB-CFT073","fepA-CFT073","fepG-CFT073","mdt-A","fepD-CFT073","ant9Ia-a","aadA6aadA10-a","SHV-133","cfiA-10","lmr-A","gafD-1","sfaA-VF0235","sfaD-CFT073","OXA-216","iss-AF0422791","SST-1","cmr-1","LUT-1","MUS-1","sfaHecol-536","AST-1","rmtC-1","SED-1","mupB-1","mupA-1","JOHN-1","rmtA-1","B-11","IMI-1","HERA-1","OXY4-1","IMP-42","TLA-1","iroN-CFT073","CARB-11","OXY2-10","sfaGecol-536","nfsA-1","parE-ecol","nfsB-1","parC-ecol","ZEG-1","TUS-1","kpsMK15ecol-536","nfaEO83K1-H4","sat2A-1","clbB-IHE3034","clbA-IHE3034","aph3I-Va","ibeC-VF0237","ibeA-VF0237","clbQ-IHE3034","sat-ecolCFT073","rfcK12-MG1655","BIL-1","cfr-1","vga-A","vgb-A","POM-1","fusB-1","VanXY-C","fusC-1","gyrB-ecol","EBR-1","gyrA-ecol","floR-1","fimG-CFT073","M-1","sul1-2","DHA-10","GOB-11","SLB-1","entD-CFT073","putativeaac-2I","pexA-1","mre-A","VanTm-L","CKO-1","clpG-1","OKPB-11","npmA-1","qepA-1","arr-3","AER-1","lnu-A","dfrC-1","GES-24","kpsT-1","sfaBecol-536","CTXM-133","fosA-1","fosB-1","fosC-1","fosX-1","fexA-1","cmrA-1","dfrA-24","dfrB-1","dfrK-1","dfrD-1","dfrG-1","afaE-3","FOX-10","iutA-CFT073","MOX-1","LCR-1","MOR-2","ant2II-Ia","CIA-1","PER-1","TMB-1","sfaDecol-536","sfaCecol-536","aac3Ibaac6I-Ib","ant4IIa-a","sat3A-1","VanZ-A","NDM-10","ibeB-RS218","ant6-Ia","clbN-IHE3034","SRT-1","aac3-IIIb","astAecol-42","otr-A","iucD-CFT073","iucC-CFT073","iucB-CFT073","iucA-CFT073","chuA-CFT073","ant4I-IIa","chuW-CFT073","chuT-CFT073","chuU-CFT073","chuS-CFT073","PLA-2a","chuX-CFT073","chuY-CFT073","DIM-1","H7fliC-CFT073","fepE-CFT073","F17likefimbrialadhesinsubunitecol-536","aadA1b-1","ampC-1","OXY6-1","oqxA-11","oqxB-24","OXY5-1","sat4A-1","OXY3-1","OXY1-1","fimA-CFT073","fimB-CFT073","fimC-CFT073","fimD-CFT073","fimE-CFT073","fimF-CFT073","fusA-1","fimH-CFT073","fimI-1","aadA11-a","BIC-1","cphA-2","KHM-1","VHH-1","sfaFecol-536","ant3II-Ia","ptsI-1","OHIO-1","uhpT-1","uhpC-1","uhpB-1","uhpA-1","F17likefimbrialusherecol-536","ole-C","DES-1","FONA-1","aad6-a","SCO-1","murA-1","ROB-1","cnf-1","cdtB-IHE3034","ant6Ia-a","CepA-1","TRU-1","aadA-24","fomB-1","aadK-1","fomA-1","srm-B","irp2-APEC01","ACC-1","catA-11","catB-10","tetA-P","ACT-24","hraAY0408591-Hek","tcpC-CFT073","TEM-216","CGA-1","CGB-1","aac3Ia-a","PME-1","cvaC-pAPECO2ColV","TER-1","ihaO157-H7","vat-A","ADC-1","aac3I-IIe","papH-CFT073","kpsD-CFT073","OKPA-11","kpsM-CFT073","LEN-24","BRO-1","aph3II-Ia","bmaE-IH11165","VanS-C","VanR-A","VanU-G","VanT-C","VanW-B","VanY-A","VanX-A","VHW-1","aac3VIa-a","VanA-A","VanC-1","VanB-Sc","VanE-1","VanD-D","VanG-G","VanH-A","VanM-M","VanL-L","KLUC-2","KLUG-1","fyuA-APEC01","grm-1","dfr2d-1","qnrC-1","qnrB-56","qnrA-1","qnrD-1","qnrS-1","tetB-P","glpT-1","papI-CFT073","draP-IH11128","papK-CFT073","papJ-CFT073","papE-CFT073","papD-CT073","papG-CFT073","papF-CFT073","papA-CFT073","papC-CFT073","papB-CFT073","draE-IH11128","draD-IH11128","draA-IH11128","draC-IH11128","draB-IH11128"]

ronald-jaepel commented 7 years ago

Here's a better overview of the errors:

at coverage 2.6 : wrong gene found of aac6I Iai as Ib3 with certainty 10.1322008044 at coverage 2.6 : wrong gene found of aadA6aadA10 a as b with certainty 10.1322008044 at coverage 2.6 : wrong gene found of GES 24 as 6 with certainty 7.42415060331 at coverage 2.6 : wrong gene found of OXY5 1 as 2 with certainty 7.42415060331 at coverage 2.6 : wrong gene found of aadA11 a as b with certainty 4.7161004022 at coverage 2.6 : wrong gene found of vat A as APEC01 with certainty 2.0080502011 at coverage 2.6 : wrong gene found of LEN 24 as 6 with certainty 2.0080502011

at coverage 3.2 : wrong gene found of aac6I Iai as Ib3 with certainty 12.8402510055 at coverage 3.2 : wrong gene found of DHA 10 as 20 with certainty 2.0080502011 at coverage 3.2 : wrong gene found of GES 24 as 6 with certainty 7.42415060331 at coverage 3.2 : wrong gene found of OXY5 1 as 2 with certainty 10.1322008044 at coverage 3.2 : wrong gene found of aadA11 a as b with certainty 7.42415060331 at coverage 3.2 : wrong gene found of vat A as APEC01 with certainty 2.0080502011

at coverage 5.1 : wrong gene found of aac6I Iai as Ib3 with certainty 20.2644016088 at coverage 5.1 : wrong gene found of GES 24 as 7 with certainty 12.1402510055 at coverage 5.1 : wrong gene found of OXY5 1 as 2 with certainty 14.8483012066 at coverage 5.1 : wrong gene found of aadA11 a as b with certainty 12.1402510055 at coverage 5.1 : wrong gene found of vat A as APEC01 with certainty 4.0161004022

at coverage 10.5 : wrong gene found of aac6I Iai as Ib3 with certainty 53.3690542231 at coverage 10.5 : wrong gene found of vat A as APEC01 with certainty 7.33220080441

ronald-jaepel commented 7 years ago

The problem with aac6I Iai -> Ib3 might be that the aac families (aac6I, aac6IIb etc) haven't been put into one gene family but split into these multiple families. Therefore there are multiple aac6-like alleles in the sample and that might lead to the error we see. I don't know about vat A though.