CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
55 stars 19 forks source link

[BUG] - CHECK_MLST failure #117

Closed ewalde1 closed 1 year ago

ewalde1 commented 1 year ago

Describe the bug MLST_CHECK step is failing for Yersinia enterocolitica sample

Impact Pipeline errors out and fails

To Reproduce Steps to reproduce the behavior:

Caused by: Process PHOENIX:PHOENIX_EXTERNAL:DO_MLST:CHECK_MLST (ISOLATE_230803) terminated with an error exit status (1)

Command executed:

wget --secure-protocol=TLSv1_3 "https://pubmlst.org/data/dbases.xml"

check_and_fix_MLST2_new2.py --input ISOLATE_230803.tsv --taxonomy ISOLATE_230803.tax --docfile dbases.xml

cat <<-END_VERSIONS > versions.yml "PHOENIX:PHOENIX_EXTERNAL:DO_MLST:CHECK_MLST": check_mlst: 1.1 pubMLST_db_download_date: END_VERSIONS

Command exit status: 1

Command output: Parsing MLST file ... reg:0 source_file Database ST locus_1 locus_2 locus_3 locus_4 locus_5 locus_6 locus_7 locus_8 lous_9 locus_10 reg:1 ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3 - adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) appending - ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3 - adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) No srst2 input file provided Taxonomy: Yersinia enterocolitica [['ISOLATE_230803.filtered.scaffolds.fa\typseudotuberculosis_achtman_3\t-\tadk(12)\targA(22)\taroA(21)\tglnA(22)\tthrA(25)\ttmk(28)\ttrpE(16)', 'mlst', 'ISOLATE_230803.tsv']] ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3 - adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) Array of original itmes ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3

adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], [], 'standard', '2023-08-07'] ['12'] 12 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12'], 'standard', '2023-08-07'] ['22'] 22 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22'], 'standard', '2023-08-07'] ['21'] 21 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21'], 'standard', '2023-08-07'] ['22'] 22 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22'], 'standard', '2023-08-07'] ['25'] 25 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25'], 'standard', '2023-08-07'] ['28'] 28 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28'], 'standard', '2023-08-07'] ['16'] 16 Schemes found: 1 [['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28', '16'], 'standard', '2023-08-07']]

of catted schemes found: 1

0 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28', '16'], 'standard', '2023-08-07'] Trimmed catted: [['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28', '16'], 'standard', '2023-08-07']]

Command error: HTTP request sent, awaiting response... 301 Moved Permanently Location: https://pubmlst.org/static/data/dbases.xml [following] --2023-08-07 09:34:35-- https://pubmlst.org/static/data/dbases.xml Reusing existing connection to pubmlst.org:443. HTTP request sent, awaiting response... 200 OK Length: 156011 (152K) [application/xml] Saving to: 'dbases.xml'

   0K .......... .......... .......... .......... .......... 32%  380K 0s
  50K .......... .......... .......... .......... .......... 65%  467K 0s
 100K .......... .......... .......... .......... .......... 98% 2.98M 0s
 150K ..                                                    100% 4.39T=0.3s

2023-08-07 09:34:36 (597 KB/s) - 'dbases.xml' saved [156011/156011]

Parsing MLST file ... reg:0 source_file Database ST locus_1 locus_2 locus_3 locus_4 locus_5 locus_6 locus_7 locus_8 lous_9 locus_10 reg:1 ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3 - adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) appending - ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3 - adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) No srst2 input file provided Taxonomy: Yersinia enterocolitica [['ISOLATE_230803.filtered.scaffolds.fa\typseudotuberculosis_achtman_3\t-\tadk(12)\targA(22)\taroA(21)\tglnA(22)\tthrA(25)\ttmk(28)\ttrpE(16)', 'mlst', 'ISOLATE_230803.tsv']] ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3 - adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) Array of original itmes ISOLATE_230803.filtered.scaffolds.fa ypseudotuberculosis_achtman_3

adk(12) argA(22) aroA(21) glnA(22) thrA(25) tmk(28) trpE(16) ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], [], 'standard', '2023-08-07'] ['12'] 12 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12'], 'standard', '2023-08-07'] ['22'] 22 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22'], 'standard', '2023-08-07'] ['21'] 21 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21'], 'standard', '2023-08-07'] ['22'] 22 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22'], 'standard', '2023-08-07'] ['25'] 25 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25'], 'standard', '2023-08-07'] ['28'] 28 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28'], 'standard', '2023-08-07'] ['16'] 16 Schemes found: 1 [['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28', '16'], 'standard', '2023-08-07']]

of catted schemes found: 1

0 ['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28', '16'], 'standard', '2023-08-07'] Trimmed catted: [['ISOLATE_230803.filtered.scaffolds.fa', 'ypseudotuberculosis_achtman_3', '-', 7, ['adk', 'argA', 'aroA', 'glnA', 'thrA', 'tmk', 'trpE'], ['12', '22', '21', '22', '25', '28', '16'], 'standard', '2023-08-07']]

Expected behavior No error, as previous isolates of the same species ran through fine.

Additional context Seems identical to https://github.com/CDCgov/phoenix/issues/91 except with Yersinia instead of Klebsiella

nvlachos commented 1 year ago

@ewalde1 Thanks for the report! After looking into the problem, it has to do with the 'old' way we used for confirming 'novel' MLSTs and we have updated the way in which we do that step. So, the best course of action would be to use v2.0.2, which also happens to include an updated MLST DB that has your exact profile defined too. I have tried a different Y enterocolitica sample and it worked fine, I also edited the profile to be novel to make sure that step works as well. However, I would like to make sure yours works too before I close this ticket, so please keep us updated!

ewalde1 commented 1 year ago

Thanks for the quick response! We are currently in the process of updating to v2.0.2 so we will test it out as soon as we have the new version up and running!

jvhagey commented 1 year ago

Hi @ewalde1 just checking in, were you able to get the new version running. I'll close the issue if everything looks good.

ewalde1 commented 1 year ago

Hi @jvhagey, I just confirmed that we were able to get the new version up and running and it did solve the issue. Thank you!