PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
106 stars 20 forks source link

Issue with PyYAML in Nextflow workflow in a custom score #206

Closed ElixBaSe closed 6 months ago

ElixBaSe commented 10 months ago

Hi,

I trust you are well. I am reaching out to report an issue I have encountered while running the Nextflow workflow for custom score calculation.

Problem Overview: During the execution of the Nextflow workflow within a virtual environment, I have encountered a problem towards the end of the process that is related to the PyYAML library. Despite having PyYAML installed within my virtual environment, the workflow appears to be unable to access it.

Steps Taken to Create the Virtual Environment:

conda create -n my_pgscatalog_utils_cloned_env python=3.10 conda activate my_pgscatalog_utils_cloned_env pip install PyYAML pip install pgscatalog-utils Here is my script: `#!/bin/bash echo "Conda Version: $(conda --version)" module load PLINK/2.00a2.3_x86_64 Java/11.0.2 R/4.2.1-foss-2022a

echo "Start" cd /well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS echo $PWD

pgs_name="DIA_TA_T2D" echo "Starting $pgs_name computation" ./nextflow run pgscatalog/pgsc_calc -profile conda \ --input sample_sheet.csv \ --target_build GRCh38 \ --parallel \ --outdir PRS_calculated \ --scorefile scorefile_338_DIA_TA.txt

echo "$pgs_name finished" `

Error Message: ModuleNotFoundError: No module named 'yaml'

[7a/45085f] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:SAMPLESHEET_JSON (sample_sheet.csv) [100%] 1 of 1 ✔ [54/27a04a] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:COMBINE_SCOREFILES (1) [100%] 1 of 1 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM - [skipped ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (MCPS chromosome 11) [100%] 23 of 23, stored: 23 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_VCF - [25/73f4d1] process > PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_VARIANTS (MCPS chromosome 1) [100%] 23 of 23 ✔ [51/335891] process > PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS) [100%] 1 of 1 ✔ [da/65e3f2] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE (MCPS chromosome 11 effect type additive 0) [100%] 21 of 21 ✔ [c3/bf2be1] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:SCORE_AGGREGATE (MCPS) [100%] 1 of 1 ✔ [f8/b891ba] process > PGSCATALOG_PGSCALC:PGSCALC:REPORT:SCORE_REPORT (MCPS) [ 0%] 0 of 1 [f0/fc22e1] process > PGSCATALOG_PGSCALC:PGSCALC:DUMPSOFTWAREVERSIONS (1) [100%] 1 of 1, failed: 1 ✘ Execution cancelled -- Finishing pending tasks before exit ERROR ~ Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:DUMPSOFTWAREVERSIONS (1)'

Caused by: Process PGSCATALOG_PGSCALC:PGSCALC:DUMPSOFTWAREVERSIONS (1) terminated with an error exit status (1)

Command executed:

!/usr/bin/env python

import yaml import platform from textwrap import dedent def _make_versions_html(versions): html = [ dedent( '''\

          <table class="table" style="width:100%" id="nf-core-versions">
              <thead>
                  <tr>
                      <th> Process Name </th>
                      <th> Software </th>
                      <th> Version  </th>
                  </tr>
              </thead>
          '''
      )
  ]
  for process, tmp_versions in sorted(versions.items()):
      html.append("<tbody>")
      for i, (tool, version) in enumerate(sorted(tmp_versions.items())):
          html.append(
              dedent(
                  f'''\
                  <tr>
                      <td><samp>{process if (i == 0) else ''}</samp></td>
                      <td><samp>{tool}</samp></td>
                      <td><samp>{version}</samp></td>
                  </tr>
                  '''
              )
          )
      html.append("</tbody>")
  html.append("</table>")
  return "\n".join(html)

module_versions = {} module_versions["DUMPSOFTWAREVERSIONS"] = { 'python': platform.python_version(), 'yaml': yaml.version } with open("collated_versions.yml") as f: workflow_versions = yaml.load(f, Loader=yaml.BaseLoader) | module_versions workflow_versions["Workflow"] = { "Nextflow": "23.10.0", "pgscatalog/pgsc_calc": "2.0.0-alpha.2" } versions_mqc = { 'id': 'software_versions', 'section_name': 'pgscatalog/pgsc_calc Software Versions', 'section_href': 'https://github.com/pgscatalog/pgsc_calc', 'plot_type': 'html', 'description': 'are collected at run time from the software output.', 'data': _make_versions_html(workflow_versions) } with open("software_versions.yml", 'w') as f: yaml.dump(workflow_versions, f, default_flow_style=False) with open("software_versions_mqc.yml", 'w') as f: yaml.dump(versions_mqc, f, default_flow_style=False) with open('versions.yml', 'w') as f: yaml.dump(module_versions, f, default_flow_style=False)

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File ".command.sh", line 2, in import yaml ModuleNotFoundError: No module named 'yaml'

Work dir: /gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS/work/f0/fc22e192b3ae7644d2e4c87b8037c3

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details ERROR ~ ERROR: No results report written!

-- Check '.nextflow.log' file for details

DIA_TA_T2D finished

Additional Information: I have noticed that Nextflow automatically configures a conda environment for the workflow. This implies that Nextflow manages its own conda environments for workflow dependencies, which may not necessarily utilize your conda virtual environment. I have even attempted to run the workflow without the -profile conda option to ensure the utilization of my virtual environment resources, but the error persists.

I appreciate your prompt attention to this matter and look forward to receiving guidance on resolving this issue.

Thank you in advance for your assistance.

Sincerely, Elizabeth

nebfield commented 10 months ago

You're manually installing dependencies for the calculator, but Nextflow is supposed to do that automatically when you use -profile conda. I think things might be getting mixed up because of this. You shouldn't have to load anything (except have Anaconda installed).

I ran some conda tests and it works OK. Perhaps you could try: