PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
113 stars 21 forks source link

run test data #277

Closed thedam closed 6 months ago

thedam commented 6 months ago

Description of the bug

Hi,

I just get an error after exectuing a test command:

ERROR ~ Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:REPORT:SCORE_REPORT (cineca)'

Caused by:
  Process `PGSCATALOG_PGSCALC:PGSCALC:REPORT:SCORE_REPORT (cineca)` terminated with an error exit status (1)

Command executed:

  echo nextflow run pgscatalog/pgsc_calc -profile test,docker > command.txt
  echo "keep_multiallelic: false" > params.txt
  echo "keep_ambiguous   : false"    >> params.txt
  echo "min_overlap      : 0.75"       >> params.txt

  cp -r /home/damian/.nextflow/assets/pgscatalog/pgsc_calc/assets/report/* .
  # workaround for unhelpful filenotfound quarto errors in some HPCs
  mkdir temp && TMPDIR=temp

  quarto render report.qmd -M "self-contained:true"         -P score_path:aggregated_scores.txt.gz         -P sampleset:cineca         -P run_ancestry:false         -P reference_panel_name:NO_PANEL

  cat <<-END_VERSIONS > versions.yml
  SCORE_REPORT:
      R: $(echo $(R --version 2>&1) | head -n 1 | cut -f 3 -d ' ')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

  processing file: report.qmd
  1/53
  2/53 [setup]
  3/53
  4/53 [setup_logs]
  5/53
  6/53 [unnamed-chunk-1]                                                                                                                                                                                                                                                                running: bash  -c "cat command.txt | fold -w 80 -s | awk -F ' ' 'NR==1 { print \"\$\", \$0} NR>1 { print \"    \" \$0}' | sed 's/\$/\\\\/' | sed '\$ s/.\$//' "
  7/53
  8/53 [load_scorefiles]

  Quitting from lines 64-133 [load_scorefiles] (report.qmd)
  Error in `mutate()`:
  ℹ In argument: `trait_display = extract_traits(.)`.
  Caused by error in `purrr::map2()`:
  ℹ In index: 1.
  ℹ With name: PGS001229_22.
  Caused by error in `purrr::map2_chr()`:
  ℹ In index: 1.
  Caused by error in `dyn.load()`:
  ! unable to load shared object '/home/damian/R/x86_64-pc-linux-gnu-library/4.3/stringi/libs/stringi.so':
    libicui18n.so.66: cannot open shared object file: No such file or directory
  Backtrace:
    1. ... %>% ...
   25. base::loadNamespace(x)
   27. base::loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]])
   28. base::library.dynam(lib, package, package.lib)
   29. base::dyn.load(file, DLLpath = DLLpath, ...)
  Execution halted

Work dir:
  /mnt/data/production/quick_pgs/work/3e/bb2fbd74940ef0bda6339c9565ee43

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: No results report written!

 -- Check '.nextflow.log' file for details

Command used and terminal output

nextflow run pgscatalog/pgsc_calc -profile test,docker

Relevant files

No response

System information

  N E X T F L O W
  version 23.10.1 build 5891
  created 12-01-2024 22:01 UTC (23:01 CEST)
  cite doi:10.1038/nbt.3820
  http://nextflow.io

 java -version

openjdk version "18.0.1" 2022-04-19 OpenJDK Runtime Environment Zulu18.30+11-CA (build 18.0.1+10) OpenJDK 64-Bit Server VM Zulu18.30+11-CA (build 18.0.1+10, mixed mode, sharing)

thedam commented 6 months ago

Don't know why it's asking for some R file from my PC. Shouldn't be it in the Docker container? Anyway, I have the file:

ll -h /home/damian/R/x86_64-pc-linux-gnu-library/4.3/stringi/libs/stringi.so
-rwxrwxr-x 1 damian damian 9.4M Apr 15 17:47 /home/damian/R/x86_64-pc-linux-gnu-library/4.3/stringi/libs/stringi.so*
nebfield commented 6 months ago

Yes, that's quite odd. The docker images are self-contained and shouldn't be interacting with your system like that.

What version of docker are you running? I noticed a warning:

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

which is new to me.

Are you able to run docker containers normally?

$ docker run hello-world

Perhaps the settings here to prevent local libraries from interfering with containers aren't working on your system. Maybe the docker image can't find R_PROFILE_USER and R_ENVIRON_USER so it falls back to the system. I'll run some experiments 🤔

thedam commented 6 months ago

Ok, after manually removing all "pgscatalog" docker images and re-run the command, it works.

carbocation commented 5 months ago

Update at the top to prevent confusion: I was getting the same error on Ubuntu 20.04, and it was resolved by upgrading Docker from v20 to v26.

Original: I have what appears to be the same issue, but which does not get resolved by removing all pgscatalog docker images. Briefly scanning, is this expecting that I have something called quarto installed locally to build the report?

$ nextflow run pgscatalog/pgsc_calc -profile test,docker --max_memory=32GB
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/pgscatalog/pgsc_calc` [focused_poitras] DSL2 - revision: 8bdf287d55 [main]
WARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`

------------------------------------------------------
  pgscatalog/pgsc_calc v2.0.0-alpha.5-g8bdf287
------------------------------------------------------
Core Nextflow options
  revision                  : main
  runName                   : focused_poitras
  containerEngine           : docker
  launchDir                 : /tmp/nextflow
  workDir                   : /tmp/nextflow/work
  projectDir                : /home/james/.nextflow/assets/pgscatalog/pgsc_calc
  userName                  : james
  profile                   : test,docker
  configFiles               : 

Input/output options
  input                     : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/assets/examples/samplesheet.csv
  scorefile                 : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/assets/examples/scorefiles/PGS001229_22.txt
  outdir                    : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/results

Reference options
  ref_samplesheet           : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/reference.csv
  ld_grch37                 : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg19-GRCh37.txt
  ld_grch38                 : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg38-GRCh38.txt
  ancestry_checksums        : /home/james/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/checksums.txt

Compatibility options
  target_build              : GRCh37

Max job request options
  max_cpus                  : 2
  max_memory                : 32GB
  max_time                  : 6.h

Other parameters
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use pgscatalog/pgsc_calc for your analysis please cite:

* The Polygenic Score Catalog
  https://doi.org/10.1038/s41588-021-00783-5

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/pgscatalog/pgsc_calc/blob/master/CITATIONS.md

executor >  local (8)
[94/3c1567] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                     [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                      -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cineca chromosome 22)              [100%] 1 of 1, stored: 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF                                             -
executor >  local (8)
[94/3c1567] process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)                                     [100%] 1 of 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM                                      -
[skipped  ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cineca chromosome 22)              [100%] 1 of 1, stored: 1 ✔
[-        ] process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_VCF                                             -
[37/3c245e] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (cineca chromosome 22)                            [100%] 1 of 1 ✔
[3a/9d87a1] process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (cineca)                                           [100%] 1 of 1 ✔
[fa/714e7d] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (cineca chromosome 22 effect type additive 0) [100%] 1 of 1 ✔
[c1/05e602] process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (cineca)                                   [100%] 1 of 1 ✔
[6f/f972af] process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (cineca)                                           [100%] 2 of 2, failed: 2, retries: 1 ✘
[f2/70d5a7] process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS (1)                                               [100%] 1 of 1 ✔
[skipping] Stored process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (1)
[5f/cbaaf0] NOTE: Process `PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (cineca)` terminated with an error exit status (134) -- Execution is retried (1)
ERROR ~ Error executing process > 'PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (cineca)'

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (cineca)` terminated with an error exit status (134)

Command executed:

  export TMPDIR=$(mktemp -d --tmpdir=.) # tmpdir must always be writable for quarto
  echo nextflow run pgscatalog/pgsc_calc -profile test,docker --max_memory=32GB > command.txt

  echo "keep_multiallelic: false" > params.txt
  echo "keep_ambiguous   : false"    >> params.txt
  echo "min_overlap      : 0.75"       >> params.txt

  quarto render report.qmd -M "self-contained:true"         -P score_path:aggregated_scores.txt.gz         -P sampleset:cineca         -P run_ancestry:false         -P reference_panel_name:NO_PANEL         -P version:2.0.0-alpha.5         -o report.html

  cat <<-END_VERSIONS > versions.yml
  SCORE_REPORT:
      R: $(echo $(R --version 2>&1) | head -n 1 | cut -f 3 -d ' ')
  END_VERSIONS

Command exit status:
  134

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

  #
  # Fatal error in , line 0
  # Check failed: Start().
  #
  #
  #
  #FailureMessage Object: 0x7ffd34ed0ce0/usr/local/bin/quarto: line 177:    62 Aborted                 (core dumped) "${QUARTO_DENO}" ${QUARTO_ACTION} ${QUARTO_DENO_OPTIONS} ${QUARTO_DENO_EXTRA_OPTIONS} ${QUARTO_IMPORT_ARGMAP} "${QUARTO_TARGET}" "$@"

Work dir:
  /tmp/nextflow/work/6f/f972afb8e37edad6d2077efd86b37c

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: No results report written!

 -- Check '.nextflow.log' file for details

Docker hello-world runs fine:

$ docker run hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/engine/userguide/

Conclusion

After upgrading from Docker 20 to Docker 26, this now works. So pgsc_calc may not be compatible with Docker 20.x.

nebfield commented 5 months ago

All of the software the workflow needs is installed in the containers, so you don't need anything else. The host system only needs nextflow and a container engine (docker, singularity) or anaconda.

I'm not sure what's causing the cgroup warning, but it suggests a problem with the way docker is set up on your OS. When the containers are run by nextflow they're set up with some complex configuration, like memory limits and automatic mounting of local directories. These problems might not show up on docker run hello-world.

Unfortunately I'm not sure how to debug this problem because I can't reproduce it. I'd suggest trying a different execution profile like singularity or conda if docker is being stubborn.

Edit: Ah, I just saw your update re: updating docker 🚀 thanks!