Closed vascokarla closed 6 months ago
Hi @vascokarla, Thanks for letting us know about this bug. This seems to be a recurring issue with the newest release of the MLST tool and how it calculates the best match. Ecoli uses an 8 allele set and when it matches the full set of 7 for other organisms, it reports that as the match. We've seen a couple pop in, but Aeromonas is most common). the good news is that wWe do have a fix, although slightly incomplete. I will patch it up and it should be included in the next release that should be out very shortly. I'll check back in with you once it is out to make sure it functions as expected!
Thank you so much for your response. Cannot wait for the new updates :) Karla
Hi @vascokarla a new version of phoenix has been released (v2.1.0) can you run it and confirm for us your issue is resolved now? Thanks!
Hi @jvhagey I was able to try the new version. It did identify the correct MLST scheme for E. coli! Though, we also tried with a new sample that was identified as Enterobacter hormaechei which was assigned the MLST scheme cronobacter
(novel ST) using either PHoeNIx v2.0.2 and v2.1.0. We ran MLST (with filtered scaffolds) using the software mlst v2.22.0 (T. Seeman) and it was assigned the scheme ecloacae
(novel ST). I see that there are exact matches with cronobacter...
I'm adding here the MLST stout for this sample if that helps
[11:37:13] This is mlst 2.22.0 running on linux with Perl 5.032001 [11:37:13] Checking mlst dependencies: [11:37:13] Found 'blastn' => /opt/conda/envs/mlst/bin/blastn [11:37:13] Found 'any2fasta' => /opt/conda/envs/mlst/bin/any2fasta [11:37:14] Found blastn: 2.12.0+ (002012) [11:37:14] Excluding 3 schemes: abaumannii ecoli vcholerae_2 [11:37:16] Found exact allele match cronobacter.pps-399 [11:37:16] Found exact allele match ecloacae.pyrG-39 [11:37:16] Found exact allele match aeromonas.gyrB-795 [11:37:16] Found exact allele match ecloacae.dnaA-62 [11:37:16] Found exact allele match ecloacae.gyrB-4 [11:37:16] Found exact allele match cronobacter.gyrB-100 [11:37:16] Found exact allele match cronobacter.atpD-211 [11:37:16] Found exact allele match cronobacter.gltB-187 [11:37:16] Found exact allele match cronobacter.infB-99 [11:37:16] Found exact allele match ecloacae.fusA-4 [11:37:16] Found exact allele match cronobacter.fusA-75 [11:37:16] Found exact allele match ecloacae.rpoB-44 [11:37:16] Found exact allele match ecloacae.rplB-4 XXXXXX.filtered.scaffolds.fa.gz ecloacae -dnaA(62) fusA(4) gyrB(4) leuS(~6) pyrG(39) rplB(4) rpoB(44) [11:37:16] Please also cite 'Jolley & Maiden 2010, BMC Bioinf, 11:595' if you use mlst. [11:37:16] Done.
@vascokarla, thanks for the info are you able to test the fix in v2.1.1-dev and confirm it fixes this issue. So the command would be
nextflow run cdcgov/phoenix -r v2.1.1-dev -profile singularity -entry PHOENIX --input $manifest --kraken2db $kraken2db --outdir $outdir/phoenix --max_cpus $threads --max_memory $memory
Hi @jvhagey. The MLST scheme was correct this time using the version v2.1.1-dev for both E. coli and Enterobacter. I'm sharing a part of the results for this. Thank you so much for your quick help with this!
Species | Taxa_Confidence | Taxa_Coverage | Taxa_Source | Kraken2_Trimd | Kraken2_Weighted | MLST_Scheme_1 | MLST_1 | MLST_Scheme_2 | MLST_2 |
---|---|---|---|---|---|---|---|---|---|
Escherichia coli | 99.98 ANI_match | 99.49 | ANI_REFSEQ | Escherichia(10.41%) coli(8.96%) | Escherichia(97.19%) coli(97.19%) | ecoli(Achtman) | ST410 | ecoli_2(Pasteur) | Novel_allele |
Enterobacter hormaechei | 99.39 ANI_match | 90.61 | ANI_REFSEQ | Enterobacter(79.94%) hormaechei(13.71%) | Enterobacter(94.70%) hormaechei(93.28%) | ecloacae | Novel_allele | - | - |
Thank you, the patch will be released this week.
Describe the bug The MLST (Multi-Locus Sequence Typing) analysis is yielding conflicting results for a sample that was classified as Escherichia coli but is being assigned the aeromonas scheme (ST2363) instead of the ecoli(Achtman) scheme (ST410).
Impact This bug is causing confusion and uncertainty about the true taxonomic identity and ST of the sample, which is critical for downstream analyses and interpretations of the sequencing data. We don't know if it's caused by a sequencing error or a software bug.
To Reproduce Steps to reproduce the behavior:
Environment: [HPC]
Pipeline Version: [PHoeNIX v2.0.2, CHECK_MLST: python: 3.7.12, MLST: mlst: 2.23.0, mlst_db: '2023-07-28']
Command:
Error Message: None [Pipeline completed successfully]. Sample warnings "Average Q30 of raw R1 reads <90.00%, <50% of reads assigned to top genera hit (11.09%), Check 1st MLST scheme matches taxa IDed."
Expected behavior The MLST analysis should consistently identify the sample as Escherichia coli, as per the initial taxonomic classification.
Screenshots When running MLST locally these are the tail results
Additional context The sample had 60X coverage, 108 contigs, assembly ratio 0.9640_(.5215)