CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
https://www.cdc.gov/ncezid/divisions-offices/about-dhqp.html
Apache License 2.0
60 stars 18 forks source link

[BUG] - fastANI exit status 139 Segementation fault #113

Closed fnindo closed 1 year ago

fnindo commented 1 year ago

[33/7af312] NOTE: Process PHOENIX:PHOENIX_EXTERNAL:FASTANI (SampleID) terminated with an error exit status (139) -- Execution is retried (1) Error executing process > 'PHOENIX:PHOENIX_EXTERNAL:FASTANI (SampleID)'

Caused by: Process PHOENIX:PHOENIX_EXTERNAL:FASTANI (SampleID) terminated with an error exit status (139)

Command executed:

fastANI \ -q SampleID.filtered.scaffolds.fa.gz \ --rl SampleID_best_MASH_hits.txt \ -o SampleID.ani.txt

cat <<-END_VERSIONS > versions.yml "PHOENIX:PHOENIX_EXTERNAL:FASTANI": fastani: $(fastANI --version 2>&1 | sed 's/version//;') END_VERSIONS

Command exit status: 139

Command output: (empty)

Command error:

Reference = [Klebsiella_pneumoniae_GCF_021228815.1_ASM2122881v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_003597695.1_ASM359769v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_003597715.1_ASM359771v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_003597755.1_ASM359775v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_019797985.1_ASM1979798v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_003036645.2_ASM303664v2_genomic.fna.gz, Klebsiella_pneumoniae_GCF_003597735.1_ASM359773v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_024917555.1_ASM2491755v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_024918095.1_ASM2491809v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_024918055.1_ASM2491805v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_000465975.2_ASM46597v2_genomic.fna.gz, Klebsiella_pneumoniae_GCF_023657735.1_ASM2365773v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_021460155.1_ASM2146015v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_024917935.1_ASM2491793v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_013282295.1_ASM1328229v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_016903295.1_ASM1690329v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_023518315.1_ASM2351831v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_001699105.2_ASM169910v2_genomic.fna.gz, Klebsiella_pneumoniae_GCF_023657795.1_ASM2365779v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_023657815.1_ASM2365781v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_023657835.1_ASM2365783v1_genomic.fna.gz, Klebsiella_pneumoniae_GCF_023657855.1_ASM2365785v1_genomic.fna.gz] Query = [SampleID.filtered.scaffolds.fa.gz] Kmer size = 16 Fragment length = 3000 Threads = 1 ANI output file = SampleID.ani.txt

INFO [thread 0], skch::main, Count of threads executing parallel_for : 1 INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24 INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 0 INFO [thread 0], skch::Sketch::index, unique minimizers = 0 INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (0, 0) ... .command.sh: line 5: 1520029 Segmentation fault (core dumped) fastANI -q SampleID.filtered.scaffolds.fa.gz --rl SampleID_best_MASH_hits.txt -o SampleID.ani.txt

Work dir: /mnt/Bacteria/mdhhs-bact_results/work/bb/1ccf6fc06bc3b7bab3508104bedb5f

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

jvhagey commented 1 year ago

Hi @fnindo, can you please provide the following information.

Usually issues with FastANI like this are actually a problem a couple steps upstream so we will just have to track that down.

fnindo commented 1 year ago

Hi Jill, The directory has these files: just removed patient Identifier and changed it to Sample_ID: .../work/bb/1ccf6fc06bc3b7bab3508104bedb5f$ ls sample_ID_best_MASH_hits.txt Klebsiella_pneumoniae_GCF_003597755.1_ASM359775v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_023657795.1_ASM2365779v1_genomic.fna.gz Sample_ID.filtered.scaffolds.fa.gz Klebsiella_pneumoniae_GCF_013282295.1_ASM1328229v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_023657815.1_ASM2365781v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_000465975.2_ASM46597v2_genomic.fna.gz Klebsiella_pneumoniae_GCF_016903295.1_ASM1690329v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_023657835.1_ASM2365783v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_001699105.2_ASM169910v2_genomic.fna.gz Klebsiella_pneumoniae_GCF_019797985.1_ASM1979798v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_023657855.1_ASM2365785v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_003036645.2_ASM303664v2_genomic.fna.gz Klebsiella_pneumoniae_GCF_021228815.1_ASM2122881v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_024917555.1_ASM2491755v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_003597695.1_ASM359769v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_021460155.1_ASM2146015v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_024917935.1_ASM2491793v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_003597715.1_ASM359771v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_023518315.1_ASM2351831v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_024918055.1_ASM2491805v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_003597735.1_ASM359773v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_023657735.1_ASM2365773v1_genomic.fna.gz Klebsiella_pneumoniae_GCF_024918095.1_ASM2491809v1_genomic.fna.gz

fnindo commented 1 year ago

The command at that process is: Command executed:

fastANI \ -q Sample_ID.filtered.scaffolds.fa.gz \ --rl Sample_ID_best_MASH_hits.txt \ -o Sample_ID.ani.txt

cat <<-END_VERSIONS > versions.yml "PHOENIX:PHOENIX_EXTERNAL:FASTANI": fastani: $(fastANI --version 2>&1 | sed 's/version//;') END_VERSIONS

Command exit status: 139

Command output: (empty)

jvhagey commented 1 year ago

Here are the next steps for tracking down the problem:

  1. We have seen issues with the sample name not being parsed correctly and then the fastANI doesn't work. Can you confirm that the files being passed to fastANI in the command match the names of the files are in the folder?

  2. Rather than just running ls can you use ls -lh and confirm that the files aren't empty? When you run that command reading right to left it should be the file name, time, date and then the size of the file. Just confirm the size isn't zero for any of them.

  3. Have a look in the .command.out and .command.err (using cat or more) that will be in the work dir /work/bb/1ccf6fc06bc3b7bab3508104bedb5f these are "hidden files" so you need to run ls -la to see them. These files likely will just say the same error as was in the nextflow exit error, but worth a try to confirm there isn't an error contained in there.

  4. If the above things look correct, we will need to check upstream. To do this go into the .nextflow.log file that will be in the directory where you ran PHoeNIx from. Again, this a "hidden file" so you need to run ls -la to see it. Go into that file and search for the DETERMINE_TOP_TAXA and MASH_DIST steps. You can either open it in an editor or use cat .nextflow.log | grep "] Submitted process > PHOENIX:PHOENIX_EXTERNAL:DETERMINE_TOP_TAXA". You are looking for a line like this:

Jul-14 12:06:41.069 [Task submitter] INFO nextflow.Session - [80/bbeb64] Submitted process > PHOENIX:PHOENIX_EXTERNAL:DETERMINE_TOP_TAXA (SampleID)

This part [80/bbeb64] of the line tells us where to find the work directory for that step. So go into that folder .../work/80/bbeb64... (you will need to press tab after bbeb64 to auto complete the rest of the folder name) and have a look in the .command.out and .command.err (using cat or more).

  1. Can you provide the full command that was used to run PHoeNIx?
jvhagey commented 1 year ago

It was confirmed that ref genomes from MASH_DIST step that were being passed down to FastaANI were empty. System admin for @fnindo identified proxy issues that led to failed retrieval of ref genomes. Once the proxy issue was resolved, the analysis ran successfully.