Open FrancoisMifsud opened 1 year ago
Yes, our pipeline with Caper can run offline. First of all I need to see a full log or any helpful output of the pipeline/caper.
Please post your ~/.caper/default.conf
.
I need to check if cromwell and womtool are defined there correctly.
These two should have local paths accessible by compute nodes.
You need to download pipeline's Singularity image first. https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/atac.wdl#L23
Make sure to match versions of WDL and the image.
Define singularity image in your input JSON.
{
"atac.singularity": "/LOCAL/PATH/TO/atac-seq-pipeline_v2.2.1.sif"
}
Also define it in Caper's command line too.
$ caper hpc submit atac.wdl -i your_input.json --singularity "/LOCAL/PATH/TO/atac-seq-pipeline_v2.2.1.sif"
Thank you very much for your reply. After downloading the singularity image locally and pointing to it in the json file and in the command line, the above connection error no longer appears.
However I do get the following error about Cromwell WomtoolValidation:
/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/google/auth/crypt/_cryptography_rsa.py:22: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography (40.0) will be the last to support Python 3.6.
import cryptography.exceptions
2023-02-01 20:21:15,132|caper.cli|INFO| Cromwell stdout: /wynton/home/vaisse/fmifsud/cromwell.out
2023-02-01 20:21:15,137|caper.caper_base|INFO| Creating a timestamped temporary directory. /wynton/home/vaisse/fmifsud/caper_temp/atac/20230201_202115_135802
2023-02-01 20:21:15,138|caper.caper_runner|INFO| Localizing files on work_dir. /wynton/home/vaisse/fmifsud/caper_temp/atac/20230201_202115_135802
2023-02-01 20:21:16,616|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2023-02-01 20:21:16,751|caper.nb_subproc_thread|ERROR| Subprocess failed. returncode=1
Traceback (most recent call last):
File "/wynton/home/vaisse/fmifsud/encode_atac_env/bin/caper", line 13, in <module>
main()
File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cli.py", line 713, in main
return runner(parsed_args, nonblocking_server=nonblocking_server)
File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cli.py", line 255, in runner
subcmd_run(c, args)
File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cli.py", line 408, in subcmd_run
dry_run=args.dry_run,
File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/caper_runner.py", line 462, in run
self._cromwell.validate(wdl=wdl, inputs=inputs, imports=imports)
File "/wynton/home/vaisse/fmifsud/encode_atac_env/lib/python3.6/site-packages/caper/cromwell.py", line 161, in validate
'RC={rc}\nSTDERR={stderr}'.format(rc=th.returncode, stderr=stderr)
caper.cromwell.WomtoolValidationFailed: RC=1
STDERR=
My caper config file is:
backend=sge
local-loc-dir=/wynton/home/vaisse/fmifsud/caper_temp
sge-pe=smp
cromwell=/wynton/home/vaisse/fmifsud/.caper/cromwell_jar/cromwell-82.jar
womtool=/wynton/home/vaisse/fmifsud/.caper/womtool_jar/womtool-82.jar
I launched the main job with the following command:
caper hpc submit /wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/atac.wdl -i /wynton/home/vaisse/fmifsud/encode_test1.json --singularity "/wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/atac-seq-pipeline_v2.2.1.sif" --leader-job-name ENCODE_leader
And my JSON file is:
{
"atac.title" : "Test_experiment_ATAC1",
"atac.description" : "Trying to run ENCODE pipeline on Sample A1 and A2",
"atac.pipeline_type" : "atac",
"atac.align_only" : false,
"atac.true_rep_only" : false,
"atac.genome_tsv" : "/wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/mm10.tsv",
"atac.paired_end" : true,
"atac.fastqs_rep1_R1" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA1_R1_trimmed.fastq.gz"],
"atac.fastqs_rep1_R2" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA1_R2_trimmed.fastq.gz"],
"atac.fastqs_rep2_R1" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA2_R1_trimmed.fastq.gz"],
"atac.fastqs_rep2_R2" : [ "/wynton/scratch/Francois/Trimmed_fastq/SampleA2_R2_trimmed.fastq.gz"],
"atac.auto_detect_adapter" : false,
"atac.adapter" : "CTGTCTCTTATACACATCT",
"atac.multimapping" : 4,
"atac.singularity": "/wynton/home/vaisse/fmifsud/encode_atac_env/atac-seq-pipeline/atac-seq-pipeline_v2.2.1.sif"
}
The mm10.tsv
lists and points to the different genome files that I have dowloaded locally.
Thanks again for your time and your help on this.
It seems the issue is similar to this one, and could be related to the java version: [https://github.com/ENCODE-DCC/atac-seq-pipeline/issues/395]
There are several versions of openjdk available on my HPC, is there a way I can export a specific JAVA_HOME
with caper?
(My individual .bashrc
is not taken into account by jobs submitted on the cluster by SGE)
I think you can use different env variable for a specific command line.
$ JAVA_HOME="/path/to/java/home" caper hpc submit...
hi, @leepc12 I downloaded all the required files and tried to run the pipeline completely locally, but still got connecting error as following:
2024-03-16 16:58:52,224 cromwell-system-akka.dispatchers.engine-dispatcher-76 INFO - Not triggering log of restart checking token queue status. Effective log interval = None
2024-03-16 16:58:52,263 cromwell-system-akka.dispatchers.engine-dispatcher-76 INFO - Not triggering log of execution token queue status. Effective log interval = None
2024-03-16 16:58:54,418 cromwell-system-akka.dispatchers.engine-dispatcher-80 INFO - WorkflowExecutionActor-bb3e9e88-9640-4091-8728-bd112ddbc437 [UUID(bb3e9e88)]: Starting atac.re
ad_genome_tsv
2024-03-16 16:58:55,278 cromwell-system-akka.dispatchers.engine-dispatcher-80 INFO - Assigned new job execution tokens to the following groups: bb3e9e88: 1
2024-03-16 16:59:15,774 INFO - Request threw an exception on attempt #1. Retrying after 535 milliseconds
org.http4s.client.ConnectionFailure: Error connecting to https://auth.docker.io using address auth.docker.io:443 (unresolved: true)
I don't understand why the pipeline needs to connect to https://auth.docker.io
. Is there any other files I need to download? Thank you in advance.
Xin
Hi, The compute nodes on my institution's HPC do not have routes to external networks. When trying to run the pipeline I get
OSError: Tunnel connection failed: 403 Forbidden
at the step when it executes"/usr/lib64/python3.6/http/client.py", line 929
Is there a way to supply required files locally to run the pipeline with no external internet access?
Thanks for your answer