Open evezeyl opened 1 year ago
Some additional explanations: Old version of MLST software use "lmonocytogenes" scheme (BIGSDB - Pasteur) While the new version of MLST software use "listeria_2" scheme - apparently a change of definition of dapE - we need to trace back. The listeria_2 scheme give rise to the multiple alleles during typing while the lmonocytogenes scheme do not.
Asked the question on Github regarding the orgin of the listeria_2 scheme in MLST repo: <https://github.com/tseemann/mlst/issues/128 question posed about origin of this scheme>
So I made an admixed sample from VI55314 and VI55689 reads. Launching pipeline MLST to see if detected correctly Also launching the quality QC assembly and QC reads to see if we detect admixture or not, to be sure
So first tests with confindr did not detect admixture (some samples did not run yet)
[x] find out if those that fail are the ones with weird alleles? (no contamination detection, failure test because memory likely on pc)
[ ] Admixture detection tests need to be relaunched when the sever is up again (OEIO mixed sample for testing)
Conclusion for now:
Results on Admixted isolate (different CCs from VIGAS-P) - No contamination detected by confindr on the sample used -
VIGAS MLST pipeline (version 2.16.1 usinglmonocytogenes MLST scheme).
For running MLST I used two assemblies from different VIGAS-P pipeline and tested with MLST software - 2 different versions- 1. using the same sheme: lmonocytogenes that was used in VIGAS-P MLST pipeline AND with the Listeria 2 (sheme that gives multiple alleles for dapE and made me doubt about contamination for some isolates - scheme that is used in ALPPACA cgMLST track).
Note here I chose isolates where no ambiguous call was made for dapE to create the admixed sample. The admixed sample was created by concatenating the reads from VI55314 and VI55689. I did test if I could detect intra-specific contamination using confindr rMLST pipeline, and could not detect any (only one SNP position - possible variant), "contamination" was detected in the admixed sample.
to run MLST on SAGA - I used two assemblies created by different assembly pipelines from VIGAS-P, with assembly annotation pipeline and contig assembly QC.
NB: lmonocytogenes scheme corresponds to the scheme a I used previously and that I know corresponds to the Scheme that was on BIGSDB pasteur at least until 2021).
Method | FILE | SCHEME | ST | abcZ | bglA | cat | dapE | dat | ldh | lhkA |
---|---|---|---|---|---|---|---|---|---|---|
MLST 2.16.1 | ADMIXED-VIGAS-P MLST pipeline | lmonocytogenes | - | 6 | 6? | 6 | 4 | 1 | 4 | 1 |
MLST mlst 2.19.0 | ADMIXED-contigs_assembly_annotation.fasta | lmonocytogenes | - | 6 | 6? | 6 | 5 | 1 | 4 | 1 |
MLST mlst 2.19.0 | ADMIXED-contigs_assembly_QC.fasta | lmonocytogenes | - | 6 | 6? | 6 | 4 | 1 | 4 | 1 |
MLST 2.23.0 | ADMIXED-contigs_assembly_annotation.fasta | listeria_2 | - | 6 | 6? | 6 | 5 | 1 | 4 | 1 |
MLST 2.23.0 | ADMIXED-contigs_assembly_QC.fasta | listeria_2 | - | 6 | 6? | 6 | 4 | 1 | 4 | 1 |
MLST 2.16.1 | VIGAS-P MLST pipeline VI55314 | lmonocytogenes | 9 | 6 | 5 | 6 | 4 | 1 | 4 | 1 |
MLST mlst 2.19.0 | VI55314.fasta | lmonocytogenes | 9 | 6 | 5 | 6 | 4 | 1 | 4 | 1 |
MLST 2.23.0 | VI55314.fasta | listeria_2 | 9 | 6 | 5 | 6 | 4 | 1 | 4 | 1 |
MLST 2.16.1 | VIGAS-P MLST pipeline VI55689 | lmonocytogenes | 121 | 7 | 6 | 8 | 8 | 6 | 37 | 1 |
MLST mlst 2.19.0 | VI55689.fasta | lmonocytogenes | 121 | 7 | 6 | 8 | 8 | 6 | 37 | 1 |
MLST 2.23.0 | VI55689.fasta | listeria_2 | 121 | 7 | 6 | 8 | 8 | 6 | 37 | 1 |
Conclusions and consequences:
Action: Suggest that we reopen the QC issue that I had opened conserning intra-genus contamination where I mentionned inplementing confindr (or any means to detect potential intraspecific contamination) as this could create problems ... and remain undetected.
Sugest finding a way to output that novel alleles have been detected but not reported using MLST
Discussion?
When 2 or more alleles are detected at the same gene, whether those alleles are new or existing, this should appear in the results. As per today it doesnt. Moreover in this case the CC group must remain undertermined, which is not the case until now
PS: I will recheck the scheme I used to be 100 sure that things are working as they should also in ALPPACA .... so we need to combine efforts