Clinical-Genomics / microSALT

Microbial Sequence Analysis and Loci-based Typing pipeline for use on NGS WGS data.
GNU General Public License v3.0
2 stars 3 forks source link

assembly issues in arcC loci for Staphylococcus Aureus #164

Open talnor opened 8 months ago

talnor commented 8 months ago

Describe the bug Since the switch from NovaSeq 6000 to NovaSeq X, Staphylococcus Aureus samples have started failing in the analysis. The issue is very systematic and has a similar effect on the analysis of a relatively large percentage of the Staphylococcus Aureus samples. This is seen as the contig covering the arcC loci is split in the middle of the region, meaning that no MLST type can be reliably assigned to the sample due to insignificant coverage of any of the spanning contigs. See more info in the deviation here.

The issue needs to be fixed so that these samples can be typed in microSALT.

To Reproduce Steps to reproduce the behavior:

  1. Run microsalt on a Staphylococcus Aureus sample sequenced on the NovaSeq X
  2. Check the loci results in the "MLST" table for the sample
    • The field Längd (HSP) % will show a low span of around ~70%.
  3. Check position 2631741 in AP017922.1 coordinates for an A->G minority SNP.

Expected behavior To circumvent the issues discussed there are a number of options:

With the data we have to work with, we think it is better to skip the spades --careful flag. Given that we get the same results as before with option C, this can be done for all samples to ensure that it is clear how the analysis is performed and to enable easier handling of microbial samples.

Test with e.g. ticket 121778.

Software version

Additional context

As a side note, microSALT does still give an estimate of the loci allele for samples that fail typing QC, but because of the limited data, this allele estimation can be expected to vary when resquencing the sample.