Clinical-Genomics / microSALT

Microbial Sequence Analysis and Loci-based Typing pipeline for use on NGS WGS data.
GNU General Public License v3.0
2 stars 3 forks source link

Issue with reporting VIM resistance genes #180

Open samuell opened 3 weeks ago

samuell commented 3 weeks ago

Describe the bug

For some datasets where ResFinder ran on the raw reads report blaVIM beta lactamase resistance genes, there is no report of VIM genes at all in microSALT.

KS reports:

För VIM så finns följande ticket: #509056 och detta är vad SciLife rapporterade

image

Och samma sekvens hos resfinder:

image

Detta prov i ticket 509056 bär på en VIM-gen: 24ET500149.

To Reproduce Steps to reproduce the behavior:

  1. Run sample 24ET500149 with microSALT 3.3.5

Expected behavior The report for "24ET500149" should include blaVIM-4.

Software version (please complete the following information):

Additional context

samuell commented 3 weeks ago

Based on some testing with a separate workflow that tries to mimic the steps of microSALT for assembly and resistance blasting, it turns out the reason VIM is not reported is because of too small alignment lenghts in the blasting against the beta-lactamase file from ResFinder db. Because of the thresholds for the report (97% identity and 90% alignment length), the matches therefore don't end up in the report, although some matches are found in the blast results.

I did test a lot of combinations of assemblers (Spades vs Skesa) and options such as (Trimming vs No trimming) and the various assembly mode flags to Spades. I will exclude other flags than --isolate here though, since we need to use that to fix the arcC issue with NovaSeq X, but I found this:

Thus, the results suggest that switching the assembler from Spades to Skesa and keeping the trimming active might potentially resolve this issue.

samuell commented 3 weeks ago

I actually tried replacing Spades with SKESA now (in the branch |/180-fix-missing-vim-genes and now the results look pretty promising.

These are the blast results (801 bp should be the full VIM gene, for most variants):

$ grep blaVIM [...path.hidden...]/results/ACC14648_2024.8.22_18.3.59/ACC14648A3/blast_search/resistance/beta-lactam.txt | awk '( $5 > 97.0 && $12 > 720 ) { print $1 "\t" $5 "\t" $12 }'
blaVIM-4_1_EU581706     99.875  801
blaVIM-54_1_KY508061    99.750  801
blaVIM-40_1_HG934765    99.750  801
blaVIM-43_1_KP096412    99.750  801
blaVIM-37_1_JX982636    99.750  801
blaVIM-28_1_JF900599    99.750  801
blaVIM-19_1_FJ499397    99.750  801
blaVIM-14_1_FJ445404    99.750  801
blaVIM-1_1_Y18050       99.750  801
blaVIM-57_1_MH450217    99.625  801
blaVIM-52_1_KX349731    99.625  801
blaVIM-42_1_KP071470    99.625  801
blaVIM-35_1_JX982634    99.625  801
blaVIM-34_1_JX013656    99.625  801
blaVIM-33_1_JX258134    99.625  801
blaVIM-32_1_JN676230    99.625  801
blaVIM-27_1_HQ858608    99.625  801
blaVIM-26_1_FR748153    99.625  801
blaVIM-39_1_KF131539    99.501  801
blaVIM-29_1_JX311308    99.501  801
blaVIM-5_1_DQ023222     98.002  801
blaVIM-49_1_KU663374    97.878  801
blaVIM-12_1_DQ143913    97.878  801
blaVIM-38_1_KC469971    97.503  801

The report does not include them though, since the scraping depends on the contig naming scheme used in Spades, so that logic needs to be updated.

But based on all I can see, they are now properly detected.