CFIA-NCFAD / nf-flu

Influenza genome analysis Nextflow workflow
MIT License
18 stars 10 forks source link

[BUG]: Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)' #46

Open PierreLyons opened 1 year ago

PierreLyons commented 1 year ago

Is there an existing issue for this?

Description of the Bug/Issue

Hi all,

First, thanks for all the work being done on this pipeline.

I'm having an issue running the pipeline at the SUBTYPING REPORT step where it throws an "Illegal instruction" error.

The same issue happens when using either Docker (24.0.6) or Podman (3.4.4), I haven't tried other containers yet.

I suspect it may be related to a hardware compatibility issue, but I thought I'd post here to see if anyone has come across this as well. The sever I am running is an older Dell T7500 with 6-core Intel Xeon x5650. (Note: This processor does not have AVX support, which I think may be causing the issue.)

This issue happens with any samples I've run so far, either FluA or FluB.

Thanks in advance!

Nextflow command-line

nextflow run CFIA-NCFAD/nf-flu --input test_samplesheet_ab.csv --platform illumina --outdir testruns/test_a -profile podman

Error Message

ERROR ~ Error executing process > 'NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)'

Caused by:
  Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT (1)` terminated with an error exit status (132)

Command executed:

  parse_influenza_blast_results.py \
   --flu-metadata 41415333-influenza.csv \
   --top 3 \
   --excel-report nf-flu-subtyping-report.xlsx \
   --pident-threshold 0.85 \
   --samplesheet samplesheet.fixed.csv \
   FluB-pB-040523-MM00001U-Qc.blastn.txt

  ln -s .command.log parse_influenza_blast_results.log

  cat <<-END_VERSIONS > versions.yml
  "NF_FLU:ILLUMINA:SUBTYPING_REPORT":
     python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  132

Command output:
  (empty)

Command error:
  .command.sh: line 8:    23 Illegal instruction     (core dumped) parse_influenza_blast_results.py --flu-metadata 41415333-influenza.csv --top 3 --excel-report nf-flu-subtyping-report.x
lsx --pident-threshold 0.85 --samplesheet samplesheet.fixed.csv FluB-pB-040523-MM00001U-Qc.blastn.txt

Work dir:
  /home/vitalite-dev/nf-flu/work/ea/b95127acda6a0a5e626cc44dc3b0a2

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

Workflow Version

Workflow 3.3.4, revision: bda4dc7d14

Nextflow Executor

local

Nextflow Version

23.04.3

Java Version

openjdk version "11.0.20.1" 2023-08-24 OpenJDK Runtime Environment (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04) OpenJDK 64-Bit Server VM (build 11.0.20.1+1-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)

Hardware

Dell T7500

Operating System (OS)

Ubuntu 22.04

Conda/Container Engine

Podman

Additional context

nextflow.log

peterk87 commented 1 year ago

Hi @PierreLyons Thanks for taking the time to report your issue!

Would you happen to be able to attach/copy-paste the contents of parse_influenza_blast_results.log for this analysis?

The parse_influenza_blast_results.py Python script makes use of Polars, Pandas and NumPy which may be trying to use functions that rely on certain CPU instructions. Or there could just be a bug in the script and the full stack trace might be helpful.

Have you tried Conda/Mamba instead of Podman/Docker?

PierreLyons commented 1 year ago

Hi @peterk87

Thanks for the quick response.

I haven't tried Conda yet, it's next on my list to try. I'll update once I do.

I'm also struggling to find the parse_influenza_black_results.log file, any ideas where it is stored? Thanks.

PierreLyons commented 1 year ago

Update regarding this issue:

Note: the same issue happens using either Docker or Conda/Mamba.

I've managed to isolate the issue to the use of polars, which seems to have been added to parse_influenza_blast_results.py in revision 3.2.0, which is when the pipeline breaks on my system. polars uses AVX, which my cpu doesn't support. The polars team has created a legacy version of polars (polars-lts-cpu) which was complied without AVX, and can be installed via pip (not available within conda-forge).

As a quick patch, I've manually created the conda env used by the subtyping_report.nf module with all dependencies except for polars, and then pip installed polars-lts-cpu within that environment. I then simply point the subtyping_report.nf module to the custom environment.

This is working well (almost, see below) now as a patch. I've also requested newer hardware, which I think is the most sensible solution to this issue.

The SUBTYPING_REPORT module completed successfully, but then the SOFTWARE_VERSIONS module threw an error.

I will describe it quickly here as well as my fix, but let me know if you'd like me to open a new issue. I'm not sure if my above fix caused this new issue.

dumpsoftwareversions.py caused this error (truncated for readability): ERROR ~ Error executing process > 'NF_FLU:ILLUMINA:SOFTWARE_VERSIONS (1)'

Caused by: Process NF_FLU:ILLUMINA:SOFTWARE_VERSIONS (1) terminated with an error exit status (1)

Command executed [/home/vitalite/.nextflow/assets/CFIA-NCFAD/nf-flu/./workflows/../modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py]:

[ ... error truncated for readability ... ]

yaml.scanner.ScannerError: mapping values are not allowed here in "collated_versions.yml", line 2, column 13

For reference, here are the first two lines of collated_versions.yml (second line truncated for readability): "NF_FLU:ILLUMINA:CAT_ILLUMINA_FASTQ": cat: Usage: cat [OPTION]... [FILE]... Concatenate FILE(s) to standard output. W [...]

This issue seems to be with the ":" after Usage.

My fix was to add a sed to remove colons from the echo commands at lines 103 and 104 in the nf-flu/modules/local/cat_illumina_fastq.nf file:

original: cat: \$(echo \$(cat --help 2>&1) | sed 's/ (.//') gzip: \$(echo \$(gzip --help 2>&1) | sed 's/ (.//')

modified: cat: \$(echo \$(cat --help 2>&1) | sed 's/ (.//' | sed 's/://') gzip: \$(echo \$(gzip --help 2>&1) | sed 's/ (.//' | sed 's/://')

This has solved the issue and now the pipeline runs (note: I can only run the pipeline with Conda).

Once again. let me know if I should open a new issue. Thanks, Pierre

MauriAndresMU1313 commented 1 month ago

Hi, could you solve this issue? Did you happen to find the .log file? do you have alternatives?