CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
52 stars 19 forks source link

[BUG] - Missing AMR genes in final output #99

Closed johnsonj161 closed 1 year ago

johnsonj161 commented 1 year ago

Describe the bug AMR genes reported in the _srst2.gamma output are missing from the _summaryline.tsv report, despite meeting all of the filtering requirements listed in the v1.1.0 update.

Below is the output of the _summaryline.tsv file: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

ID | Auto_QC_Outcome | Warning_Count | Estimated_Coverage | Genome_Length | Assembly_Ratio_(STDev) | #_of_Scaffolds_>500bp | GC_% | Species | Taxa_Confidence | Taxa_Source | Kraken2_Trimd | Kraken2_Weighted | MLST_Scheme_1 | MLST_1 | MLST_Scheme_2 | MLST_2 | GAMMA_Beta_Lactam_Resistance_Genes | GAMMA_Other_AR_Genes | AMRFinder_Point_Mutations | Hypervirulence_Genes | Plasmid_Incompatibility_Replicons | Auto_QC_Failure_Reason -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- isolate1 | PASS | 2 | 49.3 | 2815991 | 0.9944 (.0847) | 13 | 32.73 | Staphylococcus aureus | 94.31 ANI_match | ANI_REFSEQ | Staphylococcus(94.31%) aureus(41.63%) | Staphylococcus(100.00%) aureus(100.00%) | saureus | ST5 | - | - | blaI_of_Z_NG_047499.1,blaZ_29_JDXN01000019 | apH-Stph_HE579073,dha1_BA000018,fexA_JQ041372,fosB-Saur_NG_065844.1,lmrS_CP000046.1,mepA_BA000018.3,norA_D90119,tet(38)_NG_048133.1,tet(M)_NG_048223.1 | repUS43_1_CDS12738(DOp1),rep20_7_repA(SAP074A)

And here is the output of GAMMA: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Gene | Contig | Start | Stop | Match_Type | Description | Codon_Changes | BP_Changes | Transversions | Codon_Percent | BP_Percent | Percent_Length | Match_Length | Target_Length | Strand -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 30__mecA__mecA_6_BX571856__05709__beta-lactam__ResFinder | isolate1_1_length_1400484 | 1216204 | 1218211 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 2007 | 2007 | + 154__fexA__fexA_JQ041372__01286__phenicol__ARGANNOT | isolate1_1_length_1400484 | 544618 | 546046 | Mutant | G33A,A37V,P321T, | 3 | 3 | 2 | 0.9937 | 0.9979 | 1 | 1428 | 1428 | + 44__tet(M)__tet(M)_NG_048223.1__06853__tetracycline__NCBI | isolate1_1_length_1400484 | 357448 | 359368 | Native | No coding mutations | 0 | 1 | 0 | 1 | 0.9995 | 1 | 1920 | 1920 | + 993__apH__apH-Stph_HE579073__06056__aminoglycoside__ARGANNOT | isolate1_1_length_1400484 | 657972 | 658773 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 801 | 801 | - 175__mepA__mepA_BA000018.3__00454__efflux__NCBI | isolate1_1_length_1400484 | 901692 | 903048 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 1356 | 1356 | - 176__tet(38)__tet(38)_NG_048133.1__00452__tetracycline__NCBI | isolate1_1_length_1400484 | 1130597 | 1131950 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 1353 | 1353 | - 69__mecR1__mecR1_NG_051163.1__02469__beta-lactam__NCBI | isolate1_1_length_1400484 | 1214347 | 1216105 | Truncation (partial match) | 0 SNPs in 1-975,truncation at codon 329 (of 586 codons),243 coding mutations | 243 | 568 | 404 | 0.5853 | 0.6769 | 1 | 1758 | 1758 | - 1546__fosB__fosB-Saur_NG_065844.1__05639__fosfomycin__NCBI | isolate1_2_length_424739 | 153414 | 153834 | Native | No coding mutations | 0 | 1 | 0 | 1 | 0.9976 | 1 | 420 | 420 | + 149__lmrS__lmrS_CP000046.1__02018__macrolide-phenicol__NCBI | isolate1_2_length_424739 | 1687 | 3130 | Mutant | I178L,L179V,L237I,A255V, | 4 | 14 | 6 | 0.9917 | 0.9903 | 1 | 1443 | 1443 | - 264__dha1__dha1_BA000018__06602__phenicol__ARGANNOT | isolate1_2_length_424739 | 282096 | 283284 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 1188 | 1188 | - 294__norA__norA_D90119__00040__flouroquinonlone__ARGANNOT | isolate1_4_length_207373 | 142294 | 143461 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 1167 | 1167 | - 68__blaR1__blaR1-2_NG_047537.1__02467__beta-lactam__NCBI | isolate1_10_length_27308 | 0 | 1023 | Contig Edge | 1 SNPs in 736-1758, | 245 | 736 | 0 | 0.5819 | 0.5813 | 0.5819 | 1023 | 1758 | + 1577__blaI_of_Z__blaI_of_Z_NG_047499.1__02364__beta-lactam__NCBI | isolate1_10_length_27308 | 1012 | 1393 | Mutant | G21D, | 1 | 1 | 0 | 0.9921 | 0.9974 | 1 | 381 | 381 | + 625__blaZ__blaZ_29_JDXN01000019__02251__beta-lactam__ResFinder | isolate1_10_length_27308 | 25494 | 26394 | Native | No coding mutations | 0 | 0 | 0 | 1 | 1 | 1 | 900 | 900 | -

Notice that mecA is present in the GAMMA output but not the summary line output.

To Reproduce This was run using v1.1.0 on a local Linux machine using the command below:

nextflow run phoenix_v1.1.0/ -profile singularity -entry PHOENIX --input manifest.csv --kraken2db phoenix_v1.1.0/assets/databases/ --outdir phoenix --max_cpus 40 --max_memory 250.GB

johnsonj161 commented 1 year ago

I believe I have identified the issue. The Bla_Genes() function in Phoenix_summary_line.py appears to be skipping the first 2 columns, rather than just the first column, as intended using next(). This is because the first line is already called by "String1 = f.readline()", so removing "String1 = f.readline()" or "next(f)" fixes the issue.

jvhagey commented 1 year ago

Hi @johnsonj161, thanks for flagging this... def is wrong. I got the fix in, but in classic fashion I am getting a different unrelated error from Nick's script and he is out today. Let me get Nick to look at the problem Monday and we will get the patch in for this and your other issue.

johnsonj161 commented 1 year ago

@jvhagey, thanks for the quick fix! I also noticed the MLST error when I tried to run it yesterday. I will keep my eye out for the patch!

jvhagey commented 1 year ago

Addressed with release v1.1.1. Closing, please reopen if the issue is unresolved.