cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

Error in rule scorpio: with pangolin 4.3 #537

Open gadepallivs opened 7 months ago

gadepallivs commented 7 months ago

This post discusses three issues:

I’ve been using Pangolin 4.3 within a WDL workflow successfully. However, after updating Pangolin from 4.3 to 4.3.1 using pangolin --update, the pipeline broke. Reinstalling 4.3 was challenging, with Conda taking hours to solve the environment and mamba defaulting to an older Pangolin version.

``` bash-4.2$ module load miniconda3 bash-4.2$ python -V Python 3.7.10 bash-4.2$ conda -V conda 4.10.3 bash-4.2$conda create -n pangolin-mamba bash-4.2$source activate pangolin-mamba bash-4.2$conda install mamba bash-4.2$mamba install pangolin (pangolin-mamba) bash-4.2$ mamba -V mamba 1.5.6 conda 23.11.0 (pangolin-mamba) bash-4.2$ pangolin -v pangolin 1.1.5 (pangolin-mamba) bash-4.2$ python -V Python 3.12.1 ```

Installing Python 3.8 within the environment using mamba install python=3.8 led to Snakemake incompatibility.

``` (pangolin-mamba) bash-4.2$ mamba update pangolin Looking for: ['pangolin'] conda-forge/linux-64 Using cache conda-forge/noarch Using cache bioconda/linux-64 Using cache bioconda/noarch Using cache pkgs/main/linux-64 (check zst) Checked 0.1s pkgs/main/noarch (check zst) Checked 0.0s pkgs/r/linux-64 (check zst) Checked 0.0s pkgs/r/noarch (check zst) Checked 0.0s pkgs/main/noarch 702.8kB @ 4.1MB/s 0.2s pkgs/r/linux-64 1.6MB @ 7.0MB/s 0.3s pkgs/r/noarch 2.1MB @ 8.7MB/s 0.3s pkgs/main/linux-64 5.8MB @ 19.8MB/s 0.4s Pinned packages: - python 3.12.* Transaction Prefix: /users/PAS1203/osu8903/.conda/envs/pangolin-mamba All requested packages already installed (pangolin-mamba) bash-4.2$mamba install python=3.8 The following packages are incompatible ├─ python 3.8** is requested and can be installed; └─ snakemake-interface-storage-plugins is not installable because it requires └─ python >=3.11.0,<4.0.0 but there are no viable options ├─ python [3.11.0|3.11.1|...|3.12.1] conflicts with any installable versions previously reported; └─ python 3.12.0rc3 would require └─ _python_rc, which does not exist (perhaps a missing channel). ```

After several attempts, I managed to reinstall Pangolin 4.3 using conda, but the same error, in the WDl workflow pipeline , as with 4.3.1 persists.

``` bash # Our HPC uname , Linux x86_64 bash-4.2$ module load miniconda3 Lmod is automatically replacing "python/3.6-conda5.2" with "miniconda3/4.10.3-py37". bash-4.2$ python -V Python 3.7.10 bash-4.2$ conda -V conda 4.10.3 # The below command stuck at solving the environment for a long time and worked once. # In either event, it takes about 45 min or so to succeed or to fail. time conda create -n pangolin -c bioconda -c conda-forge pangolin=4.3 -y # Activate the Pangolin env source activate pangolin # Check python version 7 conda version (pangolin) bash-4.2$ python -V Python 3.8.18 (pangolin) bash-4.2$ conda -V conda 4.10.3 # Check pangolin all versions (pangolin) bash-4.2$ pangolin --all-versions pangolin: 4.3 pangolin-data: 1.24 constellations: v0.1.12 scorpio: 0.3.19 usher 0.6.3 gofasta 1.2.1 minimap2 2.26-r1175 faToVcf: 448 ```

The below error only occurs when Pangolin is part of the WDL workflow and seems to be related to the Pangolin rule scorpio. Any suggestions for resolving this issue are welcome.

``` call Pangolin { input: library_name = 'test_name', pangolin_input = fasta_file } task Pangolin { input { String library_name File pangolin_input } command { module load python source activate pangolin pangolin ${pangolin_input} \ --outdir ${library_name}.pangolin-out } output { File pangolin_output_files = "${library_name}.pangolin-out" } } ``` ``` Error in rule scorpio: jobid: 0 input: /fs/ess/PAS2145/covid19_pipeline/covid19_wdl_workflow/cromwell/test/cromwell-executions/Covid19Workflow/6d8a5fdb-dce8-4483-8efa-464c12abb039/call-Pangolin/tmp.fb7435a8/tmpzj2_93ox/has hed.aln.fasta output: /fs/ess/PAS2145/covid19_pipeline/covid19_wdl_workflow/cromwell/test/cromwell-executions/Covid19Workflow/6d8a5fdb-dce8-4483-8efa-464c12abb039/call-Pangolin/tmp.fb7435a8/tmpzj2_93ox/VO C_report.scorpio.csv log: /fs/ess/PAS2145/covid19_pipeline/covid19_wdl_workflow/cromwell/test/cromwell-executions/Covid19Workflow/6d8a5fdb-dce8-4483-8efa-464c12abb039/call-Pangolin/tmp.fb7435a8/tmpzj2_93ox/logs/ scorpio.log (check log file(s) for error details) RuleException: CalledProcessError in file /users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/site-packages/pangolin/scripts/preprocessing.smk, line 92: Command 'set -euo pipefail; scorpio classify -i /fs/ess/PAS2145/covid19_pipeline/covid19_wdl_workflow/cromwell/test/cromwell-executions/Covid19Workflow/6d8a5fdb-dce8-4483-8 efa-464c12abb039/call-Pangolin/tmp.fb7435a8 [2024-02-11 16:25:48,47] [info] WorkflowManagerActor: Workflow actor for 6d8a5fdb-dce8-4483-8efa-464c12abb039 completed with status 'Failed'. The workflow will be removed from the workflow store . [2024-02-11 16:25:49,07] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. ```
AngieHinrichs commented 7 months ago

Sorry to hear about the troubles. First, to prevent the log files from being automatically removed on exit, run pangolin with the --no-temp option. (If you want to specify the directory instead of getting a randomly generated directory like tmpzj2_93ox then you can add --tempdir /path/to/output/dir.)

When I use mamba, it pins Python ( 3.12), but downloads the old pangolin version 1.1.5, and does not update on the update call. Still trying to troubleshoot that.

I don't know why but I recently found that I needed to configure channels in this order:

Otherwise there were unresolvable conflicts between dependencies of dependencies.

gadepallivs commented 7 months ago

The channels have been set up in the correct order.

bash-4.2$ conda config --show channels
channels:
  - conda-forge
  - bioconda
  - defaults

Ideally, I’d like to use your environmental.yml file and execute the command conda env create -f environment.yml to establish the appropriate environment. Unfortunately, this approach results in error below

CondaValueError: Malformed version string '~': invalid character(s)

I attempted using the --no-temp option, which allowed me to access the scorpio.log file. However, I’m unsure of the cause of the issues it displays.

``` INFO: Found file /users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/site-packages/constellations/definitions/cXE.json for constellation Omicron (XE-like) containing 6 defining mutations INFO: Rules {'default': {'min_alt': 3, 'max_ref': 0}} Process SyncManager-1: Traceback (most recent call last): File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/managers.py", line 608, in _run_server server = cls._Server(registry, address, authkey, serializer) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/managers.py", line 154, in __init__ self.listener = Listener(address=address, backlog=16) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 448, in __init__ self._listener = SocketListener(address, family, backlog) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 591, in __init__ self._socket.bind(address) OSError: AF_UNIX path too long Traceback (most recent call last): File "/users/PAS1203/osu8903/.conda/envs/pangolin/bin/scorpio", line 10, in sys.exit(main()) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/site-packages/scorpio/__main__.py", line 272, in main args.func(args) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/site-packages/scorpio/subcommands/classify.py", line 7, in run classify_constellations(options.input, File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/site-packages/scorpio/scripts/type_constellations.py", line 763, in classify_constellations manager = mp.Manager() File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/context.py", line 57, in Manager m.start() File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/managers.py", line 583, in start self._address = reader.recv() File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/users/PAS1203/osu8903/.conda/envs/pangolin/lib/python3.8/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError ```
AngieHinrichs commented 7 months ago

OSError: AF_UNIX path too long

A quick google search indicates that having a long TMPDIR environment variable may cause that. Is there a way for you to find out what TMPDIR is set to in the environment in which you run pangolin? (or to set it to something short?)

CondaValueError: Malformed version string '~': invalid character(s)

How are you obtaining environment.yml? I don't see a '-' character in it. Instead, why not directly install pangolin (from the bioconda channel) like this:

conda create -n pangolin
conda activate pangolin
conda install pangolin
gadepallivs commented 7 months ago

I copied the env.yml file from your repository. The default Python version on our HPC is 3.6, so I attempted direct installations with Python 3.7, 3.8, and 3.9 using the following commands:

conda create -n pangolin python=3.7
conda activate pangolin
conda install pangolin=4.3 # Encountered an error when using 4.3.1, as it was not found in the channe

The OSX error was not present before. I’m currently investigating whether the WDL workflow is the source of the issue, as I’m not controlling the temp directory creation. Will follow your suggestion. Pangolin only fails within the WDL workflow , and not when run directly on command line.

AngieHinrichs commented 7 months ago

Just to make sure I understand, is this correct?:

If that is correct, then can you find out what the environment variable TMPDIR is set to in your WDL workflow? For example is there a way to add "echo $TMPDIR" to your WDL workflow just before running pangolin?

gadepallivs commented 7 months ago

@AngieHinrichs Yes - for all 3. Here is the Echo output of the temp directory before running pangolin

TMPDIR is set to: /fs/ess/PAS2145/covid19_pipeline/covid19_wdl_workflow/cromwell/test/cromwell-executions/Covid19Workflow/c2693a65-146f-435c-8475-3b47037f1eff/call-Pangolin/tmp.9352bd1d

Update 1: I have shortened my folder path by 2 folders ( ~ 30 character less) But the cromwell, creates long hash. There is a Cromwell GitHub issue addressing it, as it interferes with python multi-processing.
https://github.com/broadinstitute/cromwell/issues/3647

added these 2 steps before calling pangolin., and no luck yet.

export TMPDIR=/tmp/ mkdir -p $TMPDIR

Update 2: Thank you for your support. I had incorrectly set the temp directory, but I was able to rectify it. Here is the corrected code:

  command {
    module load python
    source activate pangolin
    echo "TMPDIR is set to: $TMPDIR" #outputs a long path 
    export TMPDIR=/tmp # set to tmp dir

    pangolin ${pangolin_input} \
    --outdir ${library_name}.pangolin-out \
    --tempdir /tmp \  #  set tmp directory

   }
kapsakcj commented 6 months ago

@gadepallivs we hit a similar issue with our WDLs and paths that were too long (we use Terra on GCP primarily though, one difference here) and here's how we tried to cope with the long pathnames: https://github.com/theiagen/public_health_bioinformatics/pull/327

It's very similar code to what you have shown above ^

When I looked into it, I found that the python multiprocessing package which scorpio uses struggles with these long paths, leading to failures that we had not seen before. If I recall correctly it has something to do with "binding sockets"(?). If you search for "AF_UNIX path too long" you find lots of threads about this