CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
https://www.cdc.gov/ncezid/divisions-offices/about-dhqp.html
Apache License 2.0
60 stars 18 forks source link

pigz error in PHOENIX:PHOENIX_EXTERNAL:ASSET_CHECK process on GCP VM #120

Open kprus opened 1 year ago

kprus commented 1 year ago

There is a "pigz: abort: internal threads error" during the ASSET_CHECK process.

Impact The pipeline crashes.

To Reproduce Environment: GCP VM (n2-standard-8). I'm not using the GLS API or batch, just trying to run the pipeline in the linux environment on the VM. I was able to run phoenix v1.1.0 on this VM with no problems, and I can also run v2.0.2 with no errors on my MacBook Pro.

Pipeline version: v2.0.2

Command run: nextflow run main.nf -profile docker -entry PHOENIX --input input/samplesheet.csv --kraken2db k2_standard_08gb_20230605.tar.gz

Screenshots

Screenshot 2023-08-18 at 8 38 51 AM

Logs .command.err, .command.out, .command.sh are attached command.sh.txt command.out.txt command.err.txt

jvhagey commented 1 year ago

I can't seem to replicate this error, can you give me more detail on the VM? I spun up a google VM (n2-standard-8) with the following parameters:

Image: ubuntu-2004-focal-v20230817 (I also tried debian-11-bullseye-v20230814) Architecture: x86/64 Size (GB): 50 Type: Balanced persistent disk Mode: Boot, read/write

After setting up the environment with the basic things needed to run PHX:

sudo apt-get update
sudo apt-get install -y --no-install-recommends bzip2 libxml2-dev

wget -O micromamba_1.4.9 https://micromamba.snakepit.net/api/micromamba/linux-64/1.4.9
tar -xvjf micromamba_1.4.9

eval "$(bin/micromamba shell hook --shell bash)"

micromamba create -n nextflow -c defaults -c bioconda -c conda-forge conda-forge::singularity bioconda::nextflow=22.04.5 && \
    micromamba clean -a -y

wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230605.tar.gz

I ran nextflow run cdcgov/phoenix -r v2.0.2 -profile singularity,test -entry PHOENIX --kraken2db https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230605.tar.gz. And while there were a couple singularity pulls that failed, which I pulled outside of the pipeline with:

singularity pull docker://staphb/gamma:2.0.2

The pipeline then ran to completion.

Can you confirm that the k2_standard_08gb_20230605.tar.gz file is ~5.5GB so we can tell if it downloaded completely? You can run ls -lh to see this.

kprus commented 1 year ago

Here's some more details on the VM:

Image: I'm not sure, the VM was set up for me 2 years ago and I don't see this information anywhere (I don't have permissions to create VMs in our GCP project). But it is ubuntu 20.04.2 Architecture: Not sure, there's just a "-" next to architecture on the instances detail page Size (GB): 750 Type: Standard persistent disk Mode: Read/write

I get the same error when I run the test profile with this command:

nextflow run cdcgov/phoenix -r v2.0.2 -profile docker,test -entry PHOENIX --kraken2db k2_standard_08gb_20230605.tar.gz
/mnt/projdata/phoenix/work/53/b4a283bc01d982af9f45e1a70f5505$ cat .command.err 
pigz: already unlocked (pigz.c:2562:create)
pigz: abort: internal threads error

The kraken database looks like it's the right size, I also re-downloaded it to make sure:

-rw-rw-r--  1 catharine_prussing_health_ny_gov catharine_prussing_health_ny_gov 5.5G Jun  7 13:59 k2_standard_08gb_20230605.tar.gz
jvhagey commented 1 year ago

Oh this might be a permissions error. You can see it says -rw-rw-r-- which means you read, but not write or execute. So try running chmod 777 k2_standard_08gb_20230605.tar.gz and then retry.

kprus commented 1 year ago

Nope, still doesn't work unfortunately:

chmod 777 k2_standard_08gb_20230605.tar.gz
ls -lh
-rwxrwxrwx  1 catharine_prussing_health_ny_gov catharine_prussing_health_ny_gov 5.5G Jun  7 13:59 k2_standard_08gb_20230605.tar.gz
nextflow run cdcgov/phoenix -r v2.0.2 -profile docker,test -entry PHOENIX --kraken2db k2_standard_08gb_20230605.tar.gz
cat /mnt/projdata/phoenix/work/e7/9f71ea7e5b6514541f84ea8ae7daad/.command.err
pigz: already unlocked (pigz.c:2562:create)
pigz: abort: internal threads error
jvhagey commented 1 year ago

Are you able to run pigz outside of the phx pipeline for example tar --use-compress-program="pigz -vdf" -xf k2_standard_08gb_20230605.tar.gz. You might need to install it with apt-get install pigz -y or in a conda environment conda create -n pigz -c conda-forge pigz then activate the environment with conda activate pigz and see if the command works there. Basically, trying to figure out if its the pipeline or your system. Here are details if you are familiar with conda.

kprus commented 1 year ago

I can run the version of pigz installed on the OS fine - tar --use-compress-program="pigz -vdf" -xf k2_standard_08gb_20230605.tar.gz runs with no errors. When I run the same command inside the quay.io/jvhagey/phoenix:base_v2.0.2 docker container, I get the same error:

pigz: already unlocked (pigz.c:2562:create)
pigz: abort: internal threads error
tar: Child returned status 1
tar: Error is not recoverable: exiting now