Open samuell opened 3 months ago
Based on some testing with a separate workflow that tries to mimic the steps of microSALT for assembly and resistance blasting, it turns out the reason VIM is not reported is because of too small alignment lenghts in the blasting against the beta-lactamase file from ResFinder db. Because of the thresholds for the report (97% identity and 90% alignment length), the matches therefore don't end up in the report, although some matches are found in the blast results.
I did test a lot of combinations of assemblers (Spades vs Skesa) and options such as (Trimming vs No trimming) and the various assembly mode flags to Spades. I will exclude other flags than --isolate
here though, since we need to use that to fix the arcC
issue with NovaSeq X, but I found this:
--isolate
flag, the alignment length increased a bit for VIM genes when turning off trimming (from ca 433bp to ca 612bp), although it is not above 90% of the full VIM length at 801bp for one of the variants, which is our treshold.Thus, the results suggest that switching the assembler from Spades to Skesa and keeping the trimming active might potentially resolve this issue.
I actually tried replacing Spades with SKESA now (in the branch |/180-fix-missing-vim-genes and now the results look pretty promising.
These are the blast results (801 bp should be the full VIM gene, for most variants):
$ grep blaVIM [...path.hidden...]/results/ACC14648_2024.8.22_18.3.59/ACC14648A3/blast_search/resistance/beta-lactam.txt | awk '( $5 > 97.0 && $12 > 720 ) { print $1 "\t" $5 "\t" $12 }'
blaVIM-4_1_EU581706 99.875 801
blaVIM-54_1_KY508061 99.750 801
blaVIM-40_1_HG934765 99.750 801
blaVIM-43_1_KP096412 99.750 801
blaVIM-37_1_JX982636 99.750 801
blaVIM-28_1_JF900599 99.750 801
blaVIM-19_1_FJ499397 99.750 801
blaVIM-14_1_FJ445404 99.750 801
blaVIM-1_1_Y18050 99.750 801
blaVIM-57_1_MH450217 99.625 801
blaVIM-52_1_KX349731 99.625 801
blaVIM-42_1_KP071470 99.625 801
blaVIM-35_1_JX982634 99.625 801
blaVIM-34_1_JX013656 99.625 801
blaVIM-33_1_JX258134 99.625 801
blaVIM-32_1_JN676230 99.625 801
blaVIM-27_1_HQ858608 99.625 801
blaVIM-26_1_FR748153 99.625 801
blaVIM-39_1_KF131539 99.501 801
blaVIM-29_1_JX311308 99.501 801
blaVIM-5_1_DQ023222 98.002 801
blaVIM-49_1_KU663374 97.878 801
blaVIM-12_1_DQ143913 97.878 801
blaVIM-38_1_KC469971 97.503 801
The report does not include them though, since the scraping depends on the contig naming scheme used in Spades, so that logic needs to be updated.
But based on all I can see, they are now properly detected.
I have added a command to convert the naming scheme back to the one used by Spades, so that no changes are needed in the Blast scraping logic. This is merged into PR #178
Although the implemented changes in #182 should have improved the situation, the customer has reported a case where a Vim gene is still missing, although it is being reported by ResFinder run on the raw reads.
Investigating the issue, it turns out we have a hit, but the longest length (591 bp) is apparently too much shorter than the full gene length of 801 (~74%) bp to be reported (it should be at least 90% of the length):
$ grep -B 2 blaVIM beta-lactam.txt
# Fields: subject title, subject strand, query acc.ver, subject acc.ver, % identity, evalue, bit score, q. start, q. end, s. start, s. end, alignment length
# 53 hits found
blaVIM-52_1_KX349731 minus NODE_5_length_2701_cov_49.6729 blaVIM-52_1_KX349731 100.000 0.0 1092 1 591 591 1 591
blaVIM-42_1_KP071470 minus NODE_5_length_2701_cov_49.6729 blaVIM-42_1_KP071470 100.000 0.0 1092 1 591 591 1 591
blaVIM-35_1_JX982634 minus NODE_5_length_2701_cov_49.6729 blaVIM-35_1_JX982634 100.000 0.0 1092 1 591 591 1 591
blaVIM-28_1_JF900599 minus NODE_5_length_2701_cov_49.6729 blaVIM-28_1_JF900599 100.000 0.0 1092 1 591 591 1 591
blaVIM-26_1_FR748153 minus NODE_5_length_2701_cov_49.6729 blaVIM-26_1_FR748153 100.000 0.0 1092 1 591 591 1 591
blaVIM-12_1_DQ143913 minus NODE_5_length_2701_cov_49.6729 blaVIM-12_1_DQ143913 100.000 0.0 1092 1 591 591 1 591
blaVIM-4_1_EU581706 minus NODE_5_length_2701_cov_49.6729 blaVIM-4_1_EU581706 100.000 0.0 1092 1 591 591 1 591
blaVIM-1_1_Y18050 minus NODE_5_length_2701_cov_49.6729 blaVIM-1_1_Y18050 100.000 0.0 1092 1 591 591 1 591
blaVIM-57_1_MH450217 minus NODE_5_length_2701_cov_49.6729 blaVIM-57_1_MH450217 99.831 0.0 1086 1 591 591 1 591
blaVIM-54_1_KY508061 minus NODE_5_length_2701_cov_49.6729 blaVIM-54_1_KY508061 99.831 0.0 1086 1 591 591 1 591
blaVIM-39_1_KF131539 minus NODE_5_length_2701_cov_49.6729 blaVIM-39_1_KF131539 99.831 0.0 1086 1 591 591 1 591
blaVIM-40_1_HG934765 minus NODE_5_length_2701_cov_49.6729 blaVIM-40_1_HG934765 99.831 0.0 1086 1 591 591 1 591
blaVIM-43_1_KP096412 minus NODE_5_length_2701_cov_49.6729 blaVIM-43_1_KP096412 99.831 0.0 1086 1 591 591 1 591
blaVIM-37_1_JX982636 minus NODE_5_length_2701_cov_49.6729 blaVIM-37_1_JX982636 99.831 0.0 1086 1 591 591 1 591
blaVIM-34_1_JX013656 minus NODE_5_length_2701_cov_49.6729 blaVIM-34_1_JX013656 99.831 0.0 1086 1 591 591 1 591
blaVIM-33_1_JX258134 minus NODE_5_length_2701_cov_49.6729 blaVIM-33_1_JX258134 99.831 0.0 1086 1 591 591 1 591
blaVIM-32_1_JN676230 minus NODE_5_length_2701_cov_49.6729 blaVIM-32_1_JN676230 99.831 0.0 1086 1 591 591 1 591
blaVIM-29_1_JX311308 minus NODE_5_length_2701_cov_49.6729 blaVIM-29_1_JX311308 99.831 0.0 1086 1 591 591 1 591
blaVIM-27_1_HQ858608 minus NODE_5_length_2701_cov_49.6729 blaVIM-27_1_HQ858608 99.831 0.0 1086 1 591 591 1 591
blaVIM-19_1_FJ499397 minus NODE_5_length_2701_cov_49.6729 blaVIM-19_1_FJ499397 99.831 0.0 1086 1 591 591 1 591
blaVIM-14_1_FJ445404 minus NODE_5_length_2701_cov_49.6729 blaVIM-14_1_FJ445404 99.831 0.0 1086 1 591 591 1 591
blaVIM-25_1_HM750249 minus NODE_5_length_2701_cov_49.6729 blaVIM-25_1_HM750249 97.970 0.0 1026 1 591 591 1 591
blaVIM-5_1_DQ023222 minus NODE_5_length_2701_cov_49.6729 blaVIM-5_1_DQ023222 97.970 0.0 1026 1 591 591 1 591
blaVIM-49_1_KU663374 minus NODE_5_length_2701_cov_49.6729 blaVIM-49_1_KU663374 97.800 0.0 1020 1 591 591 1 591
blaVIM-38_1_KC469971 minus NODE_5_length_2701_cov_49.6729 blaVIM-38_1_KC469971 97.631 0.0 1014 1 591 591 1 591
blaVIM-13_1_DQ365886 minus NODE_5_length_2701_cov_49.6729 blaVIM-13_1_DQ365886 93.401 0.0 876 1 591 591 1 591
blaVIM-30_1_JN129451 minus NODE_5_length_2701_cov_49.6729 blaVIM-30_1_JN129451 93.232 0.0 870 1 591 591 1 591
blaVIM-47_1_KT954134 minus NODE_5_length_2701_cov_49.6729 blaVIM-47_1_KT954134 93.063 0.0 865 1 591 591 1 591
blaVIM-48_1_KY362199 minus NODE_5_length_2701_cov_49.6729 blaVIM-48_1_KY362199 93.063 0.0 865 1 591 591 1 591
blaVIM-51_1_KU746270 minus NODE_5_length_2701_cov_49.6729 blaVIM-51_1_KU746270 93.063 0.0 865 1 591 591 1 591
blaVIM-41_1_KP771862 minus NODE_5_length_2701_cov_49.6729 blaVIM-41_1_KP771862 93.063 0.0 865 1 591 591 1 591
blaVIM-31_1_JN982330 minus NODE_5_length_2701_cov_49.6729 blaVIM-31_1_JN982330 93.063 0.0 865 1 591 591 1 591
blaVIM-24_1_HM855205 minus NODE_5_length_2701_cov_49.6729 blaVIM-24_1_HM855205 93.063 0.0 865 1 591 591 1 591
blaVIM-23_1_GQ242167 minus NODE_5_length_2701_cov_49.6729 blaVIM-23_1_GQ242167 93.063 0.0 865 1 591 591 1 591
blaVIM-20_1_GQ414736 minus NODE_5_length_2701_cov_49.6729 blaVIM-20_1_GQ414736 93.063 0.0 865 1 591 591 1 591
blaVIM-10_1_AY524989 minus NODE_5_length_2701_cov_49.6729 blaVIM-10_1_AY524989 93.063 0.0 865 1 591 591 1 591
blaVIM-2_1_AF302086 minus NODE_5_length_2701_cov_49.6729 blaVIM-2_1_AF302086 93.063 0.0 865 1 591 591 1 591
blaVIM-56_1_MG834535 minus NODE_5_length_2701_cov_49.6729 blaVIM-56_1_MG834535 93.576 0.0 859 1 576 591 16 576
blaVIM-50_1_KU663375 minus NODE_5_length_2701_cov_49.6729 blaVIM-50_1_KU663375 92.893 0.0 859 1 591 591 1 591
blaVIM-44_1_KP681696 minus NODE_5_length_2701_cov_49.6729 blaVIM-44_1_KP681696 92.893 0.0 859 1 591 591 1 591
blaVIM-45_1_KP681695 minus NODE_5_length_2701_cov_49.6729 blaVIM-45_1_KP681695 92.893 0.0 859 1 591 591 1 591
blaVIM-46_1_KP749829 minus NODE_5_length_2701_cov_49.6729 blaVIM-46_1_KP749829 92.893 0.0 859 1 591 591 1 591
blaVIM-36_1_JX982635 minus NODE_5_length_2701_cov_49.6729 blaVIM-36_1_JX982635 92.893 0.0 859 1 591 591 1 591
blaVIM-17_1_EU118148 minus NODE_5_length_2701_cov_49.6729 blaVIM-17_1_EU118148 92.893 0.0 859 1 591 591 1 591
blaVIM-16_1_EU419746 minus NODE_5_length_2701_cov_49.6729 blaVIM-16_1_EU419746 92.893 0.0 859 1 591 591 1 591
blaVIM-15_1_EU419745 minus NODE_5_length_2701_cov_49.6729 blaVIM-15_1_EU419745 92.893 0.0 859 1 591 591 1 591
blaVIM-11_1_AY605049 minus NODE_5_length_2701_cov_49.6729 blaVIM-11_1_AY605049 92.893 0.0 859 1 591 591 1 591
blaVIM-9_1_AY524988 minus NODE_5_length_2701_cov_49.6729 blaVIM-9_1_AY524988 92.893 0.0 859 1 591 591 1 591
blaVIM-8_1_AY524987 minus NODE_5_length_2701_cov_49.6729 blaVIM-8_1_AY524987 92.724 0.0 854 1 591 591 1 591
blaVIM-6_1_AY165025 minus NODE_5_length_2701_cov_49.6729 blaVIM-6_1_AY165025 92.724 0.0 854 1 591 591 1 591
blaVIM-3_1_AF300454 minus NODE_5_length_2701_cov_49.6729 blaVIM-3_1_AF300454 92.724 0.0 854 1 591 591 1 591
blaVIM-18_1_AM778091 minus NODE_5_length_2701_cov_49.6729 blaVIM-18_1_AM778091 91.032 0.0 787 1 591 579 1 591
blaVIM-7_1_AJ536835 minus NODE_5_length_2701_cov_49.6729 blaVIM-7_1_AJ536835 81.971 3.91e-113 405 25 501 564 88 477
Describe the bug
For some datasets where ResFinder ran on the raw reads report blaVIM beta lactamase resistance genes, there is no report of VIM genes at all in microSALT.
KS reports:
To Reproduce Steps to reproduce the behavior:
24ET500149
with microSALT 3.3.5Expected behavior The report for "24ET500149" should include
blaVIM-4
.Software version (please complete the following information):
Additional context