Phelimb / atlas

atlas
MIT License
5 stars 4 forks source link

Walk finds database allele when presented with mutated allele #13

Closed ronald-jaepel closed 8 years ago

ronald-jaepel commented 8 years ago

As discussed, I prepared genomes and runs containing alleles of the database that had one SNP introduced to them. On most samples atlas walk and genotype found the correct mutated allele (so alignment of walks' found allele against the database allele showed 1 SNP).

However on the following samples with 30x-100x coverage and only 0.5% error rate, walk and genotype called the unmutated database allele and not the actually present mutated allele.

Shown in this table is the alignment (DNA_SNPs etc) against the unmutated database allele.

Gene True_Version DNA_SNPs DNA_INDELs AA_SNPs AA_INDELs
ctxm 47 0 0 0 0
oxa 315 0 39 1 13
oxa 171 0 0 0 0
oxa 109 0 3 0 1
oxa 60 0 0 0 0
oxa 168 0 0 0 0
oxa 223 0 39 0 13
shv 70 0 3 0 1
tem 125 0 2 0 0
tem 87 0 0 0 0
tem 183 0 0 0 0

For debugging: The genomes, reads and graphs can be found at:

/data1/projects/ronald_jaepel/atlas_test/Simulation_Products/generated_genomes/2016_05_09_1252/ecoli_K12_MG1655ref.fa

/data1/projects/ronald_jaepel/atlas_test/Simulation_Products/simulated_reads/2016_05_091252/ecoli1.fq.gz or ./ecoli_2.fq.gz

/data1/projects/ronald_jaepel/atlas_test/Simulation_Products/simulated_graphs/2016_05_091252/ecoli.k31.ctx (e.g. ecoli_tem183.k31.ctx)

The databases used are: databases.zip where gn_amr-genesmutated.fasta was used to generate the genomes, reads and graphs while gn_amr-genes.fasta was used as panel for walk

The JSONS output of the atlas runs can be found at: /data1/projects/ronald_jaepel/atlas_test/JSONS/2016_05_09_1252/

ronald-jaepel commented 8 years ago

The sample tem 183 shows 100% coverage even for atlas genotype. Here's the mapping analysis: screenshot from 2016-07-04 18 07 18

Relevant is the blue box which shows the mapping to the position where the found allele and the true (mutated) allele differ (found: C, true: G)

Phelimb commented 8 years ago

OK, this issue was caused by walked being too biased by the database. You can see in the output of walk and genotype that there's 1x coverage on a kmer on the database alleles. walk chooses this path and then throws away the alternative (correct path).

47-51-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-61-62-6

and in genotype

"percent_coverage": 100.0,
"median_depth": 54.0,
"min_non_zero_depth": 1.0

I've changed the logic in walk to first filter very low depth path options before checking for support for the kmer in the pre-existing database (v0.3.11). Previously it did both of these things but in the reverse order.

ronald-jaepel commented 8 years ago

sounds good