ENCODE-DCC / atac-seq-pipeline

ENCODE ATAC-seq pipeline
MIT License
380 stars 171 forks source link

Workflow failed: python version error in tss_enrich step #306

Closed kmattioli closed 3 years ago

kmattioli commented 3 years ago

I am attempting to run this pipeline for the first time on my institution's compute cluster, but am getting a weird Python error. I've installed the encode-atac-seq-pipeline successfully (as far as I can tell), and have configured caper to run using a local backend.

I am now testing it on the sample JSON file. While most of the workflow succeeds AFAICT (metadata.json and peak files are created), the workflow eventually fails at the end with this message:

* Started troubleshooting workflow: id=99bb63da-e9bf-4ad2-9dfc-757ac31bf6f1, status=Failed
* Found failures JSON object.
[
    {
        "message": "Workflow failed",
        "causedBy": [
            {
                "causedBy": [],
                "message": "Job atac.tss_enrich:0:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            },
            {
                "causedBy": [],
                "message": "Job atac.tss_enrich:1:2 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details."
            }
        ]
    }
]
* Recursively finding failures in calls (tasks)...

==== NAME=atac.tss_enrich, STATUS=RetryableFailure, PARENT=
SHARD_IDX=0, RC=1, JOB_ID=47155
START=2021-02-05T20:40:59.801Z, END=2021-02-05T20:41:11.508Z
STDOUT=/PHShome/kz659/kaia/favors/01__amy/bin/atac-seq-pipeline/atac/99bb63da-e9bf-4ad2-9dfc-757ac31bf6f1/call-tss_enrich/shard-0/execution/stdout
STDERR=/PHShome/kz659/kaia/favors/01__amy/bin/atac-seq-pipeline/atac/99bb63da-e9bf-4ad2-9dfc-757ac31bf6f1/call-tss_enrich/shard-0/execution/stderr
STDERR_CONTENTS=
Traceback (most recent call last):
  File "/PHShome/kz659/.conda/envs/encode-atac-seq-pipeline/bin/encode_task_tss_enrich.py", line 6, in <module>
    import matplotlib as mpl
  File "/apps/lib/anaconda/2020.02/lib/python3.7/site-packages/matplotlib/__init__.py", line 272
    nonlocal called, ret
                  ^
SyntaxError: invalid syntax

The atac.tss_enrich error repeats a few times with STATUS=Failed as the final one, and the last 2 lines of the caper output are:

2021-02-05 16:00:12,154|caper.nb_subproc_thread|ERROR| Subprocess failed. returncode=1
2021-02-05 16:00:12,154|caper.cli|ERROR| Check stdout/stderr in /PHShome/kz659/kaia/favors/01__amy/bin/atac-seq-pipeline/cromwell.out

This seems like a strange python compatibility issue, but I cannot figure out how to fix it. The environment is activated, and which python points to the expected encode-atac-seq-pipeline directory. Moreover, I can't tell how important this failure is because it seems to be downstream of creating most of the important analysis files?

Any guidance would be appreciated. Thanks in advance!

leepc12 commented 3 years ago

TSS enrichment is based on metaseq which is only available for python2. So pipeline uses python2 for that specific task.

Try which python2 to check if it points to pipeline's secondary environment encode-atac-seq-pipeline-py2. If it doesn't run the uninstaller and reinstall it.

kmattioli commented 3 years ago

Thanks for your response!

which python2 points to ~/.conda/envs/encode-atac-seq-pipeline/bin/python2, not to the secondary environment ~/.conda/envs/encode-atac-seq-pipeline-python2. I uninstalled and re-installed the environments again and still have the same pointers, pipeline still failing at same spot.

kmattioli commented 3 years ago

Hmm update. I was previously running it on a CentOS 7 cluster and it was failing. I am now running it on a CentOS 6 cluster and it appears to be working. Not sure what's going on, likely something on our end as the CentOS 7 cluster is in testing, but for now I will just run on CentOS 6.

leepc12 commented 3 years ago

It's weird. Please run conda env list and check if encode-atac-seq-pipeline-python2' is there.python2is installed onencode-atac-seq-pipeline-python2' (not `encode-atac-seq-pipeline').

$ conda env liste
encode-atac-seq-pipeline     /software/miniconda3/envs/encode-atac-seq-pipeline
encode-atac-seq-pipeline-python2     /software/miniconda3/envs/encode-atac-seq-pipeline-python2
encode-chip-seq-pipeline     /software/miniconda3/envs/encode-chip-seq-pipeline
encode-chip-seq-pipeline-python2     /software/miniconda3/envs/encode-chip-seq-pipeline-python2

Run which python2 to check if it points to python in that py2 environment.

kmattioli commented 3 years ago

Should which python and which python2 point to the encode-atac-seq-pipeline and encode-atac-seq-pipeline-python2 respectively even when the environments are not activated?

I confirmed that they are installed:

[kz659@hn007 ~]$ conda env list
# conda environments:
#
encode-atac-seq-pipeline     /PHShome/kz659/.conda/envs/encode-atac-seq-pipeline
encode-atac-seq-pipeline-python2     /PHShome/kz659/.conda/envs/encode-atac-seq-pipeline-python2

But without activating an environment, which python2 points to /apps/lib/anaconda2/bin/python2.

Here's what happens after activating either environment:

(encode-atac-seq-pipeline-python2) [kz659@hn007 ~]$ conda activate encode-atac-seq-pipeline
(encode-atac-seq-pipeline) [kz659@hn007 ~]$ which python
~/.conda/envs/encode-atac-seq-pipeline/bin/python
(encode-atac-seq-pipeline) [kz659@hn007 ~]$ which python2
~/.conda/envs/encode-atac-seq-pipeline/bin/python2
[kz659@hn007 ~]$ conda activate encode-atac-seq-pipeline-python2
(encode-atac-seq-pipeline-python2) [kz659@hn007 ~]$ which python
~/.conda/envs/encode-atac-seq-pipeline-python2/bin/python
(encode-atac-seq-pipeline-python2) [kz659@hn007 ~]$ which python2
~/.conda/envs/encode-atac-seq-pipeline-python2/bin/python2

I just uninstalled the environments again and re-installed (using Anaconda3), and confirmed that all install steps appeared to execute successfully.

kmattioli commented 3 years ago

Update: I found some old lines in my ~/.bashrc file that appeared to be pointing to an old installation of Miniconda. I deleted those, logged out & back in, uninstalled & reinstalled the environments using the cluster's Anaconda3 install, and added the following to my ~/.bashrc to be able to use the conda activate command (as prompted when I tried to call it):

. /apps/software-2.12/Anaconda3/5.2.0/etc/profile.d/conda.sh

No idea which one of those things made it work, but now it works. Pipeline completes successfully on both CentOS 6 machine and CentOS 7 machine. Closing, as it was clearly some weird conda issue!