ENCODE-DCC / atac-seq-pipeline

ENCODE ATAC-seq pipeline
MIT License
383 stars 172 forks source link

Running without external network connection #403

Open FrancoisMifsud opened 1 year ago

FrancoisMifsud commented 1 year ago

Hi, The compute nodes on my institution's HPC do not have routes to external networks. When trying to run the pipeline I get OSError: Tunnel connection failed: 403 Forbidden at the step when it executes "/usr/lib64/python3.6/http/client.py", line 929

Is there a way to supply required files locally to run the pipeline with no external internet access?

Thanks for your answer

leepc12 commented 1 year ago

Yes, our pipeline with Caper can run offline. First of all I need to see a full log or any helpful output of the pipeline/caper.

Please post your ~/.caper/default.conf. I need to check if cromwell and womtool are defined there correctly. These two should have local paths accessible by compute nodes.

You need to download pipeline's Singularity image first. https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/atac.wdl#L23

Make sure to match versions of WDL and the image.

Define singularity image in your input JSON.

{
    "atac.singularity": "/LOCAL/PATH/TO/atac-seq-pipeline_v2.2.1.sif"
}

Also define it in Caper's command line too.

$ caper hpc submit atac.wdl -i your_input.json --singularity "/LOCAL/PATH/TO/atac-seq-pipeline_v2.2.1.sif"
FrancoisMifsud commented 1 year ago

Thank you very much for your reply. After downloading the singularity image locally and pointing to it in the json file and in the command line, the above connection error no longer appears.

However I do get the following error about Cromwell WomtoolValidation:

/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/google/auth/crypt/_cryptography_rsa.py:22: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography (40.0) will be the last to support Python 3.6.
  import cryptography.exceptions
2023-02-01 20:21:15,132|caper.cli|INFO| Cromwell stdout: /wynton/home/vaisse/fmifsud/cromwell.out
2023-02-01 20:21:15,137|caper.caper_base|INFO| Creating a timestamped temporary directory. /wynton/home/vaisse/fmifsud/caper_temp/atac/20230201_202115_135802
2023-02-01 20:21:15,138|caper.caper_runner|INFO| Localizing files on work_dir. /wynton/home/vaisse/fmifsud/caper_temp/atac/20230201_202115_135802
2023-02-01 20:21:16,616|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2023-02-01 20:21:16,751|caper.nb_subproc_thread|ERROR| Subprocess failed. returncode=1
Traceback (most recent call last):
  File "/wynton/home/vaisse/fmifsud/encode_atac_env/bin/caper", line 13, in <module>
    main()
  File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cli.py", line 713, in main
    return runner(parsed_args, nonblocking_server=nonblocking_server)
  File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cli.py", line 255, in runner
    subcmd_run(c, args)
  File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cli.py", line 408, in subcmd_run
    dry_run=args.dry_run,
  File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/caper_runner.py", line 462, in run
    self._cromwell.validate(wdl=wdl, inputs=inputs, imports=imports)
  File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cromwell.py", line 161, in validate
    'RC={rc}\nSTDERR={stderr}'.format(rc=th.returncode, stderr=stderr)
caper.cromwell.WomtoolValidationFailed: RC=1
STDERR=

My caper config file is:

backend=sge
local-loc-dir=/wynton/home/vaisse/fmifsud/caper_temp
sge-pe=smp
cromwell=/wynton/home/vaisse/fmifsud/.caper/cromwell_jar/cromwell-82.jar
womtool=/wynton/home/vaisse/fmifsud/.caper/womtool_jar/womtool-82.jar

I launched the main job with the following command: caper hpc submit /wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/atac.wdl -i /wynton/home/vaisse/fmifsud/encode_test1.json --singularity "/wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/atac-seq-pipeline_v2.2.1.sif" --leader-job-name ENCODE_leader

And my JSON file is:

{
"atac.title" : "Test_experiment_ATAC1",
"atac.description" : "Trying to run ENCODE pipeline on Sample A1 and A2",

"atac.pipeline_type" : "atac",
"atac.align_only" : false,
"atac.true_rep_only" : false,

"atac.genome_tsv" : "/wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/mm10.tsv",

"atac.paired_end" : true,

"atac.fastqs_rep1_R1" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA1_R1_trimmed.fastq.gz"],
"atac.fastqs_rep1_R2" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA1_R2_trimmed.fastq.gz"],
"atac.fastqs_rep2_R1" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA2_R1_trimmed.fastq.gz"],
"atac.fastqs_rep2_R2" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA2_R2_trimmed.fastq.gz"],

"atac.auto_detect_adapter" : false,
"atac.adapter" : "CTGTCTCTTATACACATCT",

"atac.multimapping" : 4,

"atac.singularity": "/wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/atac-seq-pipeline_v2.2.1.sif"

}

The mm10.tsv lists and points to the different genome files that I have dowloaded locally.

Thanks again for your time and your help on this.

FrancoisMifsud commented 1 year ago

It seems the issue is similar to this one, and could be related to the java version: [https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/395]

There are several versions of openjdk available on my HPC, is there a way I can export a specific JAVA_HOME with caper? (My individual .bashrc is not taken into account by jobs submitted on the cluster by SGE)

leepc12 commented 1 year ago

I think you can use different env variable for a specific command line.

$ JAVA_HOME="/path/to/java/home" caper hpc submit...
lixin4306ren commented 6 months ago

hi, @leepc12 I downloaded all the required files and tried to run the pipeline completely locally, but still got connecting error as following:

2024-03-16 16:58:52,224 cromwell-system-akka.dispatchers.engine-dispatcher-76 INFO  - Not triggering log of restart checking token queue status. Effective log interval = None
2024-03-16 16:58:52,263 cromwell-system-akka.dispatchers.engine-dispatcher-76 INFO  - Not triggering log of execution token queue status. Effective log interval = None
2024-03-16 16:58:54,418 cromwell-system-akka.dispatchers.engine-dispatcher-80 INFO  - WorkflowExecutionActor-bb3e9e88-9640-4091-8728-bd112ddbc437 [UUID(bb3e9e88)]: Starting atac.re
ad_genome_tsv
2024-03-16 16:58:55,278 cromwell-system-akka.dispatchers.engine-dispatcher-80 INFO  - Assigned new job execution tokens to the following groups: bb3e9e88: 1
2024-03-16 16:59:15,774  INFO  - Request threw an exception on attempt #1. Retrying after 535 milliseconds
org.http4s.client.ConnectionFailure: Error connecting to https://auth.docker.io using address auth.docker.io:443 (unresolved: true)

I don't understand why the pipeline needs to connect to https://auth.docker.io. Is there any other files I need to download? Thank you in advance.

Xin