Open gadepallivs opened 7 months ago
Sorry to hear about the troubles. First, to prevent the log files from being automatically removed on exit, run pangolin with the --no-temp
option. (If you want to specify the directory instead of getting a randomly generated directory like tmpzj2_93ox then you can add --tempdir /path/to/output/dir
.)
When I use mamba, it pins Python ( 3.12), but downloads the old pangolin version 1.1.5, and does not update on the update call. Still trying to troubleshoot that.
I don't know why but I recently found that I needed to configure channels in this order:
Otherwise there were unresolvable conflicts between dependencies of dependencies.
The channels have been set up in the correct order.
bash-4.2$ conda config --show channels
channels:
- conda-forge
- bioconda
- defaults
Ideally, I’d like to use your environmental.yml
file and execute the command conda env create -f environment.yml
to establish the appropriate environment. Unfortunately, this approach results in error below
CondaValueError: Malformed version string '~': invalid character(s)
I attempted using the --no-temp
option, which allowed me to access the scorpio.log file. However, I’m unsure of the cause of the issues it displays.
OSError: AF_UNIX path too long
A quick google search indicates that having a long TMPDIR environment variable may cause that. Is there a way for you to find out what TMPDIR is set to in the environment in which you run pangolin? (or to set it to something short?)
CondaValueError: Malformed version string '~': invalid character(s)
How are you obtaining environment.yml? I don't see a '-' character in it. Instead, why not directly install pangolin (from the bioconda channel) like this:
conda create -n pangolin
conda activate pangolin
conda install pangolin
I copied the env.yml file from your repository. The default Python version on our HPC is 3.6, so I attempted direct installations with Python 3.7, 3.8, and 3.9 using the following commands:
conda create -n pangolin python=3.7
conda activate pangolin
conda install pangolin=4.3 # Encountered an error when using 4.3.1, as it was not found in the channe
The OSX error was not present before. I’m currently investigating whether the WDL workflow is the source of the issue, as I’m not controlling the temp directory creation. Will follow your suggestion. Pangolin only fails within the WDL workflow , and not when run directly on command line.
Just to make sure I understand, is this correct?:
If that is correct, then can you find out what the environment variable TMPDIR
is set to in your WDL workflow? For example is there a way to add "echo $TMPDIR" to your WDL workflow just before running pangolin?
@AngieHinrichs Yes - for all 3. Here is the Echo output of the temp directory before running pangolin
TMPDIR is set to: /fs/ess/PAS2145/covid19_pipeline/covid19_wdl_workflow/cromwell/test/cromwell-executions/Covid19Workflow/c2693a65-146f-435c-8475-3b47037f1eff/call-Pangolin/tmp.9352bd1d
Update 1:
I have shortened my folder path by 2 folders ( ~ 30 character less)
But the cromwell, creates long hash. There is a Cromwell GitHub issue addressing it, as it interferes with python multi-processing.
https://github.com/broadinstitute/cromwell/issues/3647
added these 2 steps before calling pangolin., and no luck yet.
export TMPDIR=/tmp/
mkdir -p $TMPDIR
Update 2: Thank you for your support. I had incorrectly set the temp directory, but I was able to rectify it. Here is the corrected code:
command {
module load python
source activate pangolin
echo "TMPDIR is set to: $TMPDIR" #outputs a long path
export TMPDIR=/tmp # set to tmp dir
pangolin ${pangolin_input} \
--outdir ${library_name}.pangolin-out \
--tempdir /tmp \ # set tmp directory
}
@gadepallivs we hit a similar issue with our WDLs and paths that were too long (we use Terra on GCP primarily though, one difference here) and here's how we tried to cope with the long pathnames: https://github.com/theiagen/public_health_bioinformatics/pull/327
It's very similar code to what you have shown above ^
When I looked into it, I found that the python multiprocessing
package which scorpio
uses struggles with these long paths, leading to failures that we had not seen before. If I recall correctly it has something to do with "binding sockets"(?). If you search for "AF_UNIX path too long" you find lots of threads about this
This post discusses three issues:
conda install pangolin=4.3 -y
) works, but not always.mamba install Pangolin=4.3
leads to package conflicts. By default, mamba downloads older Pangolin version (1.1.5)I’ve been using Pangolin 4.3 within a WDL workflow successfully. However, after updating Pangolin from 4.3 to 4.3.1 using
pangolin --update
, the pipeline broke. Reinstalling 4.3 was challenging, with Conda taking hours to solve the environment and mamba defaulting to an older Pangolin version.Installing Python 3.8 within the environment using
mamba install python=3.8
led to Snakemake incompatibility.After several attempts, I managed to reinstall Pangolin 4.3 using conda, but the same error, in the WDl workflow pipeline , as with 4.3.1 persists.
The below error only occurs when Pangolin is part of the WDL workflow and seems to be related to the Pangolin rule scorpio. Any suggestions for resolving this issue are welcome.