CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
50 stars 17 forks source link

Griphin steps being failed while running #127

Closed shovanmoon closed 7 months ago

shovanmoon commented 7 months ago

-[cdcgov/phoenix] Pipeline completed with errors- WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info. Error executing process > 'PHOENIX:PHOENIX_EXTERNAL:GRIPHIN (1)'

Caused by: Process PHOENIX:PHOENIX_EXTERNAL:GRIPHIN (1) terminated with an error exit status (1)

Command executed:

full_path=$(readlink -f results)

GRiPHin.py -d $full_path -a ResGANNCBI_20230517_srst2.fasta --output results --coverage 30 --phoenix

cat <<-END_VERSIONS > versions.yml "PHOENIX:PHOENIX_EXTERNAL:GRIPHIN": python: $(python --version | sed 's/Python //g') phoenix_base_container: base_v2.0.2 END_VERSIONS

can any one help me to solve the issue? I used to run the pipeline successfully. However, after editing some ram value this error is appearing everytime I run the pipeline.

jvhagey commented 7 months ago

What ram value did you edit? How are you running this (HPC, laptop, cloud etc)?

shovanmoon commented 7 months ago

What ram value did you edit? How are you running this (HPC, laptop, cloud etc)?

I am using laptop and ubuntu latest version. My laptop ram was 16gb. At that time the pipeline worked fine. That time I modified all the values in base.config and cdcsge.config which are higer than 16gb to 14gb. and the pipeline worked fine for me. one sample took around 18 min to finish. Today I updated my laptop ram capacity to 32 gb so I updated those value to 28 gb which I have previously modified. But after doing this the the griphin stage appearing failed each time I ran the sample. Screenshot from 2023-12-07 20-35-10

jvhagey commented 7 months ago

The cdcsge.config file only gets used if you pass that in your run command with -profile cdcsge. However, this config file is only useful if running on a sge HPC so don't worry about that in the future.

Have you run this sample successfully before? This looks like a python error not a RAM issue. If the step is trying to get more RAM it retries with more memory 1 time before failing (so you would see it say "retry" next to the GRiPHin step) and it looks like GRiPHin is outright failing. From the beginning of the error message where it says "data_df = pd.read_csv(trim_stats, sep="\t", header=0)" its saying having an issue with the NG_S32_trimmed_read_counts.txt file.

  1. Can you show the end of the python error that is being printed?
  2. Was the file NG_S32_trimmed_read_counts.txt created successfully and the contents look normal? Its not blank or showing 0 reads or something like that?
shovanmoon commented 7 months ago

Thanks for your valuable suggestions regarding ram and the cdcde.config.

yeah I have run this sample succesfully two times before geting this error. Screenshot from 2023-12-07 22-13-43 Screenshot from 2023-12-07 22-14-18

I have matched the trimmed_read_count.txt files with the suceesful one, it seems okay to me.

jvhagey commented 7 months ago

So the error is "Not a directory". There seems to be a lock on the file that is preventing it from opening the file for some reason. I would try deleting that folder and rerunning with output to a different directory. Perhaps the pipeline died halfway through in that directory and it didn't unlock the file and then rerunning hit the error.

Two other things:

  1. Are you with a state public health lab? If so there might be a more suitable pipeline to run for Gonorrhoaeae if you email HAISeq@cdc.gov we can connect you.
  2. Crop the bottom on the image off so you don't show your computer name :)
shovanmoon commented 7 months ago

Great Jvhagey....

Thanks you a lot. After reading your last comment, I deleted all the previous files. and run the sample again and its completed succesfully. Thanks a lot for being with me. I work in a genomic lab in icddrb bangladesh. Working with CDC funded ARCH project where mostly deal with AMR organisms. One of the bioinformatican from CDC helped us to run this pipeline for the project samples. Currently installing this pipeline to run COlre organisms from this project. And, this sample just a test sample to test if the pipeline working well. However, can I use any other suitable pipeline for COlre bacteria? Then How should I connect?

Beside setting this pipeline I am also trying to installing it in my offical desktop. But in that case the test run failed in halfway always. Can you help me solving the issue while i m working with officie desktop?

Sure editing those picture with my laptop name :)

jvhagey commented 7 months ago

ok, as the install is a separate issue I am going to close this issue. Kara will reach out to you about appropriate pipelines and if the install remains a problem open a new issue for it. Thanks for using PHX and being part of ARCH :)

shovanmoon commented 7 months ago

Thanks Jill..