MDU-PHL / meningotype

In silico serotyping, finetyping and Bexsero antigen sequence typing of Neisseria meningitidis
GNU General Public License v3.0
11 stars 6 forks source link

Meningotype seems to have trouble identifying FetA F5-1 #37

Open boasvdp opened 2 years ago

boasvdp commented 2 years ago

I have recently incorporated meningotype in the pipeline we use at our lab, and was comparing output to external and internal reference datasets.

While checking the dataset compiled by Bogaerts et al., I ran into six strains for which FetA identification is a bit inconsistent. Isolates were WGSed in triplicate by Bogaerts et al. For some replicates FetA could not be identified while in other replicates of the same strain, FetA was typed correctly. Five out of six strains for which this happened were F5-1. SRA accessions, correct FetA (from PubMLST) and meningotype results are listed in samples_fetA.txt. If it helps I can share the shovill assemblies of the run accessions listed in the sheet.

Using a custom ABRicate database of FetA, I can find the whole fetA gene and after translation to protein sequence, the exact F5-1 VR protein motif as well(GEFEISGKKKDPKDPKKEIDKTDEEKAKDKKDMDLVHSYKLS, from here). The whole fetA genes seem to be assembled and usually >100 bp away from contig ends.

Switching on some msg calls in meningotype seems to indicate the isPcr call in https://github.com/MDU-PHL/meningotype/blob/aece5c9851c7ffc8357306a33a369307dddb726f/meningotype/meningotype.py#L229 is not finding FetA and therefore finetypeBLAST is not called for fetA. Setting stepSize to 2 instead of 3 on line 229 causes isPcr to "amplify" the FetA VR.

The weird thing is, stepSize=3 works well for some samples such as SRR6953990:

(base) boas$ isPcr genomes/SRR6953990.fasta primers.tsv stdout -maxSize=800 -tileSize=10
 -minPerfect=8 -stepSize=3
>contig00011:875-1333 fetA_seq 459bp TTCAACTTCGACAGCCGCCTT TTGCAGCGCGTCATACAGGCG
TTCAACTTtGACAGCCGCCTTgccgaacaaaccctgttgaaatacggtat
caactaccgccatcaggaaatcaaaccgcaagcgtttttgaacggcgaat
ttgagatctccggtaagaagaaagatccgaaagatcccaaaaaagaaata
gataagaccgatgaagaaaaagcgaaagacaagaaagatatggatcttgt
ccattcctacaaactgtccaacccgaccaaaaccgataccggcgcgtata
tcgaagccattcacgagcttgacggctttaccctgaccggcgggctgcgt
tacgaccgcttcaaggtgaaaacccacgacggcaaaaccgtttcaagcag
caaccttaacccgagtttcggcgtgatttggcagccgcacgaacactgga
gcttcagctcaagccacaactacgccagccgcagcccgCGCCTGTATGAC
GCGCTGCAA

But not for others such as SRR6953892:

(base) boas$ isPcr genomes/SRR6953892.fasta primers.tsv stdout -maxSize=800 -tileSize=10 -minPerfect=8 -stepSize=3

However setting stepSize=2 for SRR6953892 gives an identical match to the stepSize=3 match for sample SRR6953990. I don't have a clue why.

(base) boas$ isPcr genomes/SRR6953892.fasta primers.tsv stdout -maxSize=800 -tileSize=10 -minPerfect=8 -stepSize=2
>contig00071:1814+2272 fetA_seq 459bp TTCAACTTCGACAGCCGCCTT TTGCAGCGCGTCATACAGGCG
TTCAACTTtGACAGCCGCCTTgccgaacaaaccctgttgaaatacggtat
caactaccgccatcaggaaatcaaaccgcaagcgtttttgaacggcgaat
ttgagatctccggtaagaagaaagatccgaaagatcccaaaaaagaaata
gataagaccgatgaagaaaaagcgaaagacaagaaagatatggatcttgt
ccattcctacaaactgtccaacccgaccaaaaccgataccggcgcgtata
tcgaagccattcacgagcttgacggctttaccctgaccggcgggctgcgt
tacgaccgcttcaaggtgaaaacccacgacggcaaaaccgtttcaagcag
caaccttaacccgagtttcggcgtgatttggcagccgcacgaacactgga
gcttcagctcaagccacaactacgccagccgcagcccgCGCCTGTATGAC
GCGCTGCAA

Would it make sense to set stepSize to 2 instead of 3? For 18 samples, this increased analysis time from 48 to 59 seconds, but it managed to identify all FetA types missed with stepSize=3.

Thanks in advance!

boasvdp commented 2 years ago

Just a short update: setting stepSize to 2 also increased concordance between meningotype and wet-lab typing from our lab (Sanger sequencing of fetA/porA VRs). For 584 samples, mismatches decreased from 26 to 10 (FetA) and 14 to 10 (PorA).