Closed johnsonj161 closed 1 year ago
Hi @johnsonj161, you are correct that we are behind on documentation. Thanks for bring that to our attention. I just updated the summary line column details. Just to make sure its all correct, can you please send reply with the command you used to run this and send the output uploaded as the Phoenic_Output_Report.tsv
file that comes out of PHoeNIx. I need to be able to see the spacing as it is in the file to see where something might have gone wrong.
In terms of it staying consistent, we will do our best to not change things too much as I realize people will probably write code based on the output. However, I do think there is room for improvement in how we present the AR data so I would anticipate we might change that a bit in the future. We are happy to take suggestions or requests for improvement if you have them. Let me know if this answers your question.
Thank you for the quick reply, @jvhagey! Below is the command I used, along with the requested attached file:
nextflow run absolute/path/to/phoenix/ -profile singularity -entry PHOENIX --input manifest.csv --kraken2db absolute/path/to/kraken2db --outdir absolute/path/to/directory
Phoenix_Output_Report.txt Note: I had to change the file extension to .txt to upload to github.
I see now that the columns in the isolate-specific report (i.e., *_summaryline.tsv) are different from the consolidated report (i.e., Phoenix_OutputReport.tsv). The main n differences appear to be the ordering of the "GC%," "Kraken2_Trimd", and "Kraken2_Weighted" columns. The isolate-specific report also does not contain column names and it appears that my consolidated report does not contain the "Plasmid_Incompatibility_Replicons" column described in the wiki.
Thanks again!
@johnsonj161, the *_summaryline.tsv
should have the same columns in the same order as Phoenix_Output_Report.tsv
. The Phoenix_Output_Report.tsv
basically just concats the *_summaryline.tsv
files together so there shouldn't be a difference. The isolate specific report. I see the Plasmid_Incompatibility_Replicons
in the code for version 1.0.0 (which is what you should be running). I am not sure what version of the pipeline you are running (when was it pulled?). Can you use nextflow run cdcgov/phoenix -r main -profile singularity -entry PHOENIX --input manifest.csv --kraken2db absolute/path/to/kraken2db/ --outdir absolute/path/to/directory
and let me know the output? Also, if you want to email me the sample I can try it on my end (HAISeq@cdc.gov) to figure out the issue.
I can add a header to the *_summaryline.tsv
in the next release for ease in interpreting it.
I am new to Nextflow and nf-core, so I am not sure where to find my version number. Would you be able to guide me on this? I am also not the person who installed the version I am working with, but I know it was installed 9/20/2022. Hope that is informative.
I tried running the command you have above and am now running into an error with spades (see below).
NOTE: Process
PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:SPADES (isolate1)terminated with an error exit status (1) -- Error is ignored
I don't know if this would be better discussed in a new issue thread?
@johnsonj161 based on that date then you are running the dev version and not an official release (official first release was 10/12/2022). If you run the following command nextflow run cdcgov/phoenix -r v1.0.0 -profile singularity -entry PHOENIX --input manifest.csv --kraken2db absolute/path/to/kraken2db/ --outdir absolute/path/to/directory
the -r
parameter pulls a specific version. By default it pulls the latest version. The version is printed out every time the pipeline runs. It will look like this:
And the second question would be best discussed though a different thread. Please include what information there is in .command.out
and .command.err
files for the SPAdes step. To find these files you will want to follow the nextflow full work path ( work
would be in the directory you ran the command and there will something like [g7/7gdt3l]
next to the SPADES step in nextflow that is the rest of the path). So those .command files
will be in work/g7/7gdt3l......
not here 7gdt3l
is just the start of that folder name its a long string of random stuff just use the tab key to auto complete. Hopefully, that makes sense.
Thank you, @jvhagey. It makes sense that my original question was likely due to me using an older version. I will create a new thread regarding the new issue.
Hi,
I am running PHoeNIX v1.0.0 and the columns in the *_summaryline.tsv file do not appear to match the column name descriptions on the wiki page (https://github.com/CDCgov/phoenix/wiki/Running-PHoeNIx#outputs). Specifically, there appear to be more columns in the output than what are listed on the wiki.
According to the wiki the file should contain columns: ID, QC_Outcome, Coverage, Genome_Length, AssemblyRatio(STDev), number_of_Scaffolds, Species, MLST_Scheme_1, MLST_1, MLST_Scheme_2, MLST2, GC%, Beta_Lactam_Resistance_Genes, Other_AR_Genes, Hypervirulence_Genes, AMRFinder_Point_Mutations, QC_Reason.
Below is an example of the output I got:
Isolate1 PASS 1 86.26 2814526 1.0409 (N/A) 24 Staphylococcus sp.T93 99.92% ANI_match ANI_REFSEQ saureus ST8 NA - 32.69 Staphylococcus(96.14%) aureus(14.19%) Staphylococcus(100.00%) aureus(99.66%) blaI_of_Z_NG_047499.1,blaR1_NG_051774.1,blaZ_NG_055997.1,mecA_6_BX571856,mecR1_NG_051163.1 ant(6)-Ia_NG_047395.1,apH-Stph_HE579073,aph(3')-IIIa_NG_047418.1,dha1_BA000018,fosB-Saur_NG_065844.1,lmrS_CP000046.1,mepA_AY661734.1,mph(C)_NG_047991.1,msr(A)_NG_055998.1,norA_D90119,sat4_NG_048072.1,tet(38)_NG_048134.1 No hypervirulence genes found No point mutations found
Does this look normal? Also, do you know if these columns will stay consistent?
Thanks!