Clinical-Genomics / microSALT

Microbial Sequence Analysis and Loci-based Typing pipeline for use on NGS WGS data.
GNU General Public License v3.0
2 stars 3 forks source link

Issue with reporting VIM resistance genes #180

Open samuell opened 3 months ago

samuell commented 3 months ago

Describe the bug

For some datasets where ResFinder ran on the raw reads report blaVIM beta lactamase resistance genes, there is no report of VIM genes at all in microSALT.

KS reports:

För VIM så finns följande ticket: #509056 och detta är vad SciLife rapporterade

image

Och samma sekvens hos resfinder:

image

Detta prov i ticket 509056 bär på en VIM-gen: 24ET500149.

To Reproduce Steps to reproduce the behavior:

  1. Run sample 24ET500149 with microSALT 3.3.5

Expected behavior The report for "24ET500149" should include blaVIM-4.

Software version (please complete the following information):

Additional context

samuell commented 3 months ago

Based on some testing with a separate workflow that tries to mimic the steps of microSALT for assembly and resistance blasting, it turns out the reason VIM is not reported is because of too small alignment lenghts in the blasting against the beta-lactamase file from ResFinder db. Because of the thresholds for the report (97% identity and 90% alignment length), the matches therefore don't end up in the report, although some matches are found in the blast results.

I did test a lot of combinations of assemblers (Spades vs Skesa) and options such as (Trimming vs No trimming) and the various assembly mode flags to Spades. I will exclude other flags than --isolate here though, since we need to use that to fix the arcC issue with NovaSeq X, but I found this:

Thus, the results suggest that switching the assembler from Spades to Skesa and keeping the trimming active might potentially resolve this issue.

samuell commented 3 months ago

I actually tried replacing Spades with SKESA now (in the branch |/180-fix-missing-vim-genes and now the results look pretty promising.

These are the blast results (801 bp should be the full VIM gene, for most variants):

$ grep blaVIM [...path.hidden...]/results/ACC14648_2024.8.22_18.3.59/ACC14648A3/blast_search/resistance/beta-lactam.txt | awk '( $5 > 97.0 && $12 > 720 ) { print $1 "\t" $5 "\t" $12 }'
blaVIM-4_1_EU581706     99.875  801
blaVIM-54_1_KY508061    99.750  801
blaVIM-40_1_HG934765    99.750  801
blaVIM-43_1_KP096412    99.750  801
blaVIM-37_1_JX982636    99.750  801
blaVIM-28_1_JF900599    99.750  801
blaVIM-19_1_FJ499397    99.750  801
blaVIM-14_1_FJ445404    99.750  801
blaVIM-1_1_Y18050       99.750  801
blaVIM-57_1_MH450217    99.625  801
blaVIM-52_1_KX349731    99.625  801
blaVIM-42_1_KP071470    99.625  801
blaVIM-35_1_JX982634    99.625  801
blaVIM-34_1_JX013656    99.625  801
blaVIM-33_1_JX258134    99.625  801
blaVIM-32_1_JN676230    99.625  801
blaVIM-27_1_HQ858608    99.625  801
blaVIM-26_1_FR748153    99.625  801
blaVIM-39_1_KF131539    99.501  801
blaVIM-29_1_JX311308    99.501  801
blaVIM-5_1_DQ023222     98.002  801
blaVIM-49_1_KU663374    97.878  801
blaVIM-12_1_DQ143913    97.878  801
blaVIM-38_1_KC469971    97.503  801

The report does not include them though, since the scraping depends on the contig naming scheme used in Spades, so that logic needs to be updated.

But based on all I can see, they are now properly detected.

samuell commented 1 month ago

I have added a command to convert the naming scheme back to the one used by Spades, so that no changes are needed in the Blast scraping logic. This is merged into PR #178

samuell commented 3 weeks ago

Although the implemented changes in #182 should have improved the situation, the customer has reported a case where a Vim gene is still missing, although it is being reported by ResFinder run on the raw reads.

Investigating the issue, it turns out we have a hit, but the longest length (591 bp) is apparently too much shorter than the full gene length of 801 (~74%) bp to be reported (it should be at least 90% of the length):

$ grep -B 2 blaVIM beta-lactam.txt 
# Fields: subject title, subject strand, query acc.ver, subject acc.ver, % identity, evalue, bit score, q. start, q. end, s. start, s. end, alignment length
# 53 hits found
blaVIM-52_1_KX349731    minus   NODE_5_length_2701_cov_49.6729  blaVIM-52_1_KX349731    100.000 0.0     1092    1       591     591     1       591
blaVIM-42_1_KP071470    minus   NODE_5_length_2701_cov_49.6729  blaVIM-42_1_KP071470    100.000 0.0     1092    1       591     591     1       591
blaVIM-35_1_JX982634    minus   NODE_5_length_2701_cov_49.6729  blaVIM-35_1_JX982634    100.000 0.0     1092    1       591     591     1       591
blaVIM-28_1_JF900599    minus   NODE_5_length_2701_cov_49.6729  blaVIM-28_1_JF900599    100.000 0.0     1092    1       591     591     1       591
blaVIM-26_1_FR748153    minus   NODE_5_length_2701_cov_49.6729  blaVIM-26_1_FR748153    100.000 0.0     1092    1       591     591     1       591
blaVIM-12_1_DQ143913    minus   NODE_5_length_2701_cov_49.6729  blaVIM-12_1_DQ143913    100.000 0.0     1092    1       591     591     1       591
blaVIM-4_1_EU581706     minus   NODE_5_length_2701_cov_49.6729  blaVIM-4_1_EU581706     100.000 0.0     1092    1       591     591     1       591
blaVIM-1_1_Y18050       minus   NODE_5_length_2701_cov_49.6729  blaVIM-1_1_Y18050       100.000 0.0     1092    1       591     591     1       591
blaVIM-57_1_MH450217    minus   NODE_5_length_2701_cov_49.6729  blaVIM-57_1_MH450217    99.831  0.0     1086    1       591     591     1       591
blaVIM-54_1_KY508061    minus   NODE_5_length_2701_cov_49.6729  blaVIM-54_1_KY508061    99.831  0.0     1086    1       591     591     1       591
blaVIM-39_1_KF131539    minus   NODE_5_length_2701_cov_49.6729  blaVIM-39_1_KF131539    99.831  0.0     1086    1       591     591     1       591
blaVIM-40_1_HG934765    minus   NODE_5_length_2701_cov_49.6729  blaVIM-40_1_HG934765    99.831  0.0     1086    1       591     591     1       591
blaVIM-43_1_KP096412    minus   NODE_5_length_2701_cov_49.6729  blaVIM-43_1_KP096412    99.831  0.0     1086    1       591     591     1       591
blaVIM-37_1_JX982636    minus   NODE_5_length_2701_cov_49.6729  blaVIM-37_1_JX982636    99.831  0.0     1086    1       591     591     1       591
blaVIM-34_1_JX013656    minus   NODE_5_length_2701_cov_49.6729  blaVIM-34_1_JX013656    99.831  0.0     1086    1       591     591     1       591
blaVIM-33_1_JX258134    minus   NODE_5_length_2701_cov_49.6729  blaVIM-33_1_JX258134    99.831  0.0     1086    1       591     591     1       591
blaVIM-32_1_JN676230    minus   NODE_5_length_2701_cov_49.6729  blaVIM-32_1_JN676230    99.831  0.0     1086    1       591     591     1       591
blaVIM-29_1_JX311308    minus   NODE_5_length_2701_cov_49.6729  blaVIM-29_1_JX311308    99.831  0.0     1086    1       591     591     1       591
blaVIM-27_1_HQ858608    minus   NODE_5_length_2701_cov_49.6729  blaVIM-27_1_HQ858608    99.831  0.0     1086    1       591     591     1       591
blaVIM-19_1_FJ499397    minus   NODE_5_length_2701_cov_49.6729  blaVIM-19_1_FJ499397    99.831  0.0     1086    1       591     591     1       591
blaVIM-14_1_FJ445404    minus   NODE_5_length_2701_cov_49.6729  blaVIM-14_1_FJ445404    99.831  0.0     1086    1       591     591     1       591
blaVIM-25_1_HM750249    minus   NODE_5_length_2701_cov_49.6729  blaVIM-25_1_HM750249    97.970  0.0     1026    1       591     591     1       591
blaVIM-5_1_DQ023222     minus   NODE_5_length_2701_cov_49.6729  blaVIM-5_1_DQ023222     97.970  0.0     1026    1       591     591     1       591
blaVIM-49_1_KU663374    minus   NODE_5_length_2701_cov_49.6729  blaVIM-49_1_KU663374    97.800  0.0     1020    1       591     591     1       591
blaVIM-38_1_KC469971    minus   NODE_5_length_2701_cov_49.6729  blaVIM-38_1_KC469971    97.631  0.0     1014    1       591     591     1       591
blaVIM-13_1_DQ365886    minus   NODE_5_length_2701_cov_49.6729  blaVIM-13_1_DQ365886    93.401  0.0     876     1       591     591     1       591
blaVIM-30_1_JN129451    minus   NODE_5_length_2701_cov_49.6729  blaVIM-30_1_JN129451    93.232  0.0     870     1       591     591     1       591
blaVIM-47_1_KT954134    minus   NODE_5_length_2701_cov_49.6729  blaVIM-47_1_KT954134    93.063  0.0     865     1       591     591     1       591
blaVIM-48_1_KY362199    minus   NODE_5_length_2701_cov_49.6729  blaVIM-48_1_KY362199    93.063  0.0     865     1       591     591     1       591
blaVIM-51_1_KU746270    minus   NODE_5_length_2701_cov_49.6729  blaVIM-51_1_KU746270    93.063  0.0     865     1       591     591     1       591
blaVIM-41_1_KP771862    minus   NODE_5_length_2701_cov_49.6729  blaVIM-41_1_KP771862    93.063  0.0     865     1       591     591     1       591
blaVIM-31_1_JN982330    minus   NODE_5_length_2701_cov_49.6729  blaVIM-31_1_JN982330    93.063  0.0     865     1       591     591     1       591
blaVIM-24_1_HM855205    minus   NODE_5_length_2701_cov_49.6729  blaVIM-24_1_HM855205    93.063  0.0     865     1       591     591     1       591
blaVIM-23_1_GQ242167    minus   NODE_5_length_2701_cov_49.6729  blaVIM-23_1_GQ242167    93.063  0.0     865     1       591     591     1       591
blaVIM-20_1_GQ414736    minus   NODE_5_length_2701_cov_49.6729  blaVIM-20_1_GQ414736    93.063  0.0     865     1       591     591     1       591
blaVIM-10_1_AY524989    minus   NODE_5_length_2701_cov_49.6729  blaVIM-10_1_AY524989    93.063  0.0     865     1       591     591     1       591
blaVIM-2_1_AF302086     minus   NODE_5_length_2701_cov_49.6729  blaVIM-2_1_AF302086     93.063  0.0     865     1       591     591     1       591
blaVIM-56_1_MG834535    minus   NODE_5_length_2701_cov_49.6729  blaVIM-56_1_MG834535    93.576  0.0     859     1       576     591     16      576
blaVIM-50_1_KU663375    minus   NODE_5_length_2701_cov_49.6729  blaVIM-50_1_KU663375    92.893  0.0     859     1       591     591     1       591
blaVIM-44_1_KP681696    minus   NODE_5_length_2701_cov_49.6729  blaVIM-44_1_KP681696    92.893  0.0     859     1       591     591     1       591
blaVIM-45_1_KP681695    minus   NODE_5_length_2701_cov_49.6729  blaVIM-45_1_KP681695    92.893  0.0     859     1       591     591     1       591
blaVIM-46_1_KP749829    minus   NODE_5_length_2701_cov_49.6729  blaVIM-46_1_KP749829    92.893  0.0     859     1       591     591     1       591
blaVIM-36_1_JX982635    minus   NODE_5_length_2701_cov_49.6729  blaVIM-36_1_JX982635    92.893  0.0     859     1       591     591     1       591
blaVIM-17_1_EU118148    minus   NODE_5_length_2701_cov_49.6729  blaVIM-17_1_EU118148    92.893  0.0     859     1       591     591     1       591
blaVIM-16_1_EU419746    minus   NODE_5_length_2701_cov_49.6729  blaVIM-16_1_EU419746    92.893  0.0     859     1       591     591     1       591
blaVIM-15_1_EU419745    minus   NODE_5_length_2701_cov_49.6729  blaVIM-15_1_EU419745    92.893  0.0     859     1       591     591     1       591
blaVIM-11_1_AY605049    minus   NODE_5_length_2701_cov_49.6729  blaVIM-11_1_AY605049    92.893  0.0     859     1       591     591     1       591
blaVIM-9_1_AY524988     minus   NODE_5_length_2701_cov_49.6729  blaVIM-9_1_AY524988     92.893  0.0     859     1       591     591     1       591
blaVIM-8_1_AY524987     minus   NODE_5_length_2701_cov_49.6729  blaVIM-8_1_AY524987     92.724  0.0     854     1       591     591     1       591
blaVIM-6_1_AY165025     minus   NODE_5_length_2701_cov_49.6729  blaVIM-6_1_AY165025     92.724  0.0     854     1       591     591     1       591
blaVIM-3_1_AF300454     minus   NODE_5_length_2701_cov_49.6729  blaVIM-3_1_AF300454     92.724  0.0     854     1       591     591     1       591
blaVIM-18_1_AM778091    minus   NODE_5_length_2701_cov_49.6729  blaVIM-18_1_AM778091    91.032  0.0     787     1       591     579     1       591
blaVIM-7_1_AJ536835     minus   NODE_5_length_2701_cov_49.6729  blaVIM-7_1_AJ536835     81.971  3.91e-113       405     25      501     564     88      477