CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
52 stars 19 forks source link

*_summaryline.tsv columns do not match the descriptions in the wiki #80

Closed johnsonj161 closed 1 year ago

johnsonj161 commented 1 year ago

Hi,

I am running PHoeNIX v1.0.0 and the columns in the *_summaryline.tsv file do not appear to match the column name descriptions on the wiki page (https://github.com/CDCgov/phoenix/wiki/Running-PHoeNIx#outputs). Specifically, there appear to be more columns in the output than what are listed on the wiki.

According to the wiki the file should contain columns: ID, QC_Outcome, Coverage, Genome_Length, AssemblyRatio(STDev), number_of_Scaffolds, Species, MLST_Scheme_1, MLST_1, MLST_Scheme_2, MLST2, GC%, Beta_Lactam_Resistance_Genes, Other_AR_Genes, Hypervirulence_Genes, AMRFinder_Point_Mutations, QC_Reason.

Below is an example of the output I got:

Isolate1 PASS 1 86.26 2814526 1.0409 (N/A) 24 Staphylococcus sp.T93 99.92% ANI_match ANI_REFSEQ saureus ST8 NA - 32.69 Staphylococcus(96.14%) aureus(14.19%) Staphylococcus(100.00%) aureus(99.66%) blaI_of_Z_NG_047499.1,blaR1_NG_051774.1,blaZ_NG_055997.1,mecA_6_BX571856,mecR1_NG_051163.1 ant(6)-Ia_NG_047395.1,apH-Stph_HE579073,aph(3')-IIIa_NG_047418.1,dha1_BA000018,fosB-Saur_NG_065844.1,lmrS_CP000046.1,mepA_AY661734.1,mph(C)_NG_047991.1,msr(A)_NG_055998.1,norA_D90119,sat4_NG_048072.1,tet(38)_NG_048134.1 No hypervirulence genes found No point mutations found

Does this look normal? Also, do you know if these columns will stay consistent?

Thanks!

jvhagey commented 1 year ago

Hi @johnsonj161, you are correct that we are behind on documentation. Thanks for bring that to our attention. I just updated the summary line column details. Just to make sure its all correct, can you please send reply with the command you used to run this and send the output uploaded as the Phoenic_Output_Report.tsv file that comes out of PHoeNIx. I need to be able to see the spacing as it is in the file to see where something might have gone wrong.

In terms of it staying consistent, we will do our best to not change things too much as I realize people will probably write code based on the output. However, I do think there is room for improvement in how we present the AR data so I would anticipate we might change that a bit in the future. We are happy to take suggestions or requests for improvement if you have them. Let me know if this answers your question.

johnsonj161 commented 1 year ago

Thank you for the quick reply, @jvhagey! Below is the command I used, along with the requested attached file:

nextflow run absolute/path/to/phoenix/ -profile singularity -entry PHOENIX --input manifest.csv --kraken2db absolute/path/to/kraken2db --outdir absolute/path/to/directory

Phoenix_Output_Report.txt Note: I had to change the file extension to .txt to upload to github.

I see now that the columns in the isolate-specific report (i.e., *_summaryline.tsv) are different from the consolidated report (i.e., Phoenix_OutputReport.tsv). The main n differences appear to be the ordering of the "GC%," "Kraken2_Trimd", and "Kraken2_Weighted" columns. The isolate-specific report also does not contain column names and it appears that my consolidated report does not contain the "Plasmid_Incompatibility_Replicons" column described in the wiki.

Thanks again!

jvhagey commented 1 year ago

@johnsonj161, the *_summaryline.tsv should have the same columns in the same order as Phoenix_Output_Report.tsv. The Phoenix_Output_Report.tsv basically just concats the *_summaryline.tsv files together so there shouldn't be a difference. The isolate specific report. I see the Plasmid_Incompatibility_Replicons in the code for version 1.0.0 (which is what you should be running). I am not sure what version of the pipeline you are running (when was it pulled?). Can you use nextflow run cdcgov/phoenix -r main -profile singularity -entry PHOENIX --input manifest.csv --kraken2db absolute/path/to/kraken2db/ --outdir absolute/path/to/directory and let me know the output? Also, if you want to email me the sample I can try it on my end (HAISeq@cdc.gov) to figure out the issue.

I can add a header to the *_summaryline.tsv in the next release for ease in interpreting it.

johnsonj161 commented 1 year ago

I am new to Nextflow and nf-core, so I am not sure where to find my version number. Would you be able to guide me on this? I am also not the person who installed the version I am working with, but I know it was installed 9/20/2022. Hope that is informative.

I tried running the command you have above and am now running into an error with spades (see below).

NOTE: ProcessPHOENIX:PHOENIX_EXTERNAL:SPADES_WF:SPADES (isolate1)terminated with an error exit status (1) -- Error is ignored

I don't know if this would be better discussed in a new issue thread?

jvhagey commented 1 year ago

@johnsonj161 based on that date then you are running the dev version and not an official release (official first release was 10/12/2022). If you run the following command nextflow run cdcgov/phoenix -r v1.0.0 -profile singularity -entry PHOENIX --input manifest.csv --kraken2db absolute/path/to/kraken2db/ --outdir absolute/path/to/directory the -r parameter pulls a specific version. By default it pulls the latest version. The version is printed out every time the pipeline runs. It will look like this:

image

And the second question would be best discussed though a different thread. Please include what information there is in .command.out and .command.err files for the SPAdes step. To find these files you will want to follow the nextflow full work path ( work would be in the directory you ran the command and there will something like [g7/7gdt3l] next to the SPADES step in nextflow that is the rest of the path). So those .command files will be in work/g7/7gdt3l...... not here 7gdt3l is just the start of that folder name its a long string of random stuff just use the tab key to auto complete. Hopefully, that makes sense.

johnsonj161 commented 1 year ago

Thank you, @jvhagey. It makes sense that my original question was likely due to me using an older version. I will create a new thread regarding the new issue.