Closed mjafin closed 5 years ago
OK, answering myself, I followed the instructions to the point and didn't specify the file locations.
Having done this worked:
bcbio_vm.py template --systemconfig bcbio_system-gcp.yaml ${TEMPLATE}-template.yaml $PNAME.csv gs://snp-calling-project/inputs/P09_S06_BRCA1_c1105delG_1.fastq.gz gs://snp-calling-project/inputs/P09_S06_BRCA1_c1105delG_2.fastq.gz gs://snp-calling-project/inputs/P09_S12_BRCA2_c100024GA_1.fastq.gz gs://snp-calling-project/inputs/P09_S12_BRCA2_c100024GA_2.fastq.gz
Looks like wildcards don't work:
bcbio_vm.py template --systemconfig bcbio_system-gcp.yaml ${TEMPLATE}-template.yaml $PNAME.csv gs://snp-calling-project/inputs/*.fastq.gz
WARNING: sample not found P09_S06_BRCA1_c1105delG
WARNING: sample not found P09_S12_BRCA2_c100024GA
EDIT: Noticed a few other things:
Running
bcbio_vm.py cwl --systemconfig bcbio_system-gcp.yaml $PNAME/config/$PNAME.yaml
creates a new folder called $PNAME-workflow
instead of placing the cwl files in $PNAME
. This then causes the actual run command to error out as it's not finding the cwl files. The command could be changed in the docs to bcbio_vm.py cwlrun cromwell ${PNAME}-workflow ...
At https://bcbio-nextgen.readthedocs.io/en/latest/contents/cloud.html#docs-cloud-gcp there is a typo in gcloud iam service-accounts keys create ~/.config/glcoud/your-service-account.json
where it says glcoud instead of gcloud
I got a Docker missing error I believe. Do I need Docker running on the local comp where I'm launching the GCP processing from? Or somehow on GCP?
Miika; Great to hear from you, and thanks for trying out the GCP CWL support. This feedback is super helpful, I appreciate you testing this out. Sorry about the poor documentation here, I've just updated it to try and make it more clear what to put in the first samplename column for CWL runs. While the way you did it will work, it's easier just to put the full file name then you don't have specify anything during the template command at all. It also makes it easier to swap back and forth between a local and GCP run without needing to re-configure the template commands.
For the run, could you pass along the error messages you're seeing? You shouldn't need to have Docker locally for a GCP run, and it should manage this all as part of the process there so be transparent.
Thanks again for the help with improving the docs and documenting this.
Cheers Brad I'll try to reproduce. I'm making a local install at the moment, trying to patch together a hg38 genome for my GCP runs. I presume I just copy the hg38 folder over to a gs bucket and throw in the cosmic vcf?
The other thing that is missing from the documentation is that for the newly generated project it's necessary to enable the Genomics API. Don't know if this is possible on the command line? I can try if you don't have access to GCP.
EDIT: could be gcloud services enable genomics.googleapis.com
So here are a few things that may be unrelated. I get some of these:
[2019-01-29 20:05:16,26] [warn] PipelinesApiAsyncBackendJobExecutionActor [609851bdprocess_alignment:0:1]: Unrecognized runtime attribute keys: memoryMax, cpuMax, tmpDirMax, outDirMax
Then later in I believe process_alignment:
[2019-01-29 20:24:22,53] [info] PipelinesApiAsyncBackendJobExecutionActor [a5a7507bprocess_alignment:0:1]: Status change from Running to Success
[2019-01-29 20:24:24,43] [error] WorkflowManagerActor Workflow bf746f5a-66f4-4960-b9be-6b478fc6958c failed (during ExecutingWorkflowState): Job process_alignment:0:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: gs://snp-calling-project/work_cromwell/main-test_run.cwl/bf746f5a-66f4-4960-b9be-6b478fc6958c/call-alignment/shard-0/wf-alignment.cwl/609851bd-1855-435b-8294-85c11776a709/call-process_alignment/shard-0/stderr.
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 223, in <module>
runfn.process(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 57, in process
out = fn(*fnargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 54, in wrapper
return f(*args, **kwargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 119, in process_alignment
return sample.process_alignment(*args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 128, in process_alignment
data = align_to_sort_bam(fastq1, fastq2, aligner, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 83, in align_to_sort_bam
names, align_dir, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 158, in _align_from_fastq
out = align_fn(fastq1, fastq2, align_ref, names, align_dir, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 170, in align_pipe
names, rg_info, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 181, in _align_mem
[do.file_nonempty(tx_out_file), do.file_reasonable_size(tx_out_file, fastq_file)])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 26, in run
_do_run(cmd, checks, log_stdout, env=env)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && /usr/local/share/bcbio-nextgen/anaconda/bin/bwa mem -c 250 -M -t 2 -R '@RG\tID:P09_S06_BRCA1_c1105delG\tPL:illumina\tPU:P09_S06_BRCA1_c1105delG\tSM:P09_S06_BRCA1_c1105delG' -v 1 /cromwell_root/bcbiodata/collections/hg38/bwa/hg38.fa /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/bf746f5a-66f4-4960-b9be-6b478fc6958c/call-alignment/shard-0/wf-alignment.cwl/609851bd-1855-435b-8294-85c11776a709/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_1.fastq.gz /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/bf746f5a-66f4-4960-b9be-6b478fc6958c/call-alignment/shard-0/wf-alignment.cwl/609851bd-1855-435b-8294-85c11776a709/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_2.fastq.gz | /usr/local/share/bcbio-nextgen/anaconda/bin/bamsormadup inputformat=sam threads=2 tmpfile=/cromwell_root/bcbiotx/tmp0vNekY/P09_S06_BRCA1_c1105delG-sort-sorttmp-markdup SO=coordinate indexfilename=/cromwell_root/bcbiotx/tmp0vNekY/P09_S06_BRCA1_c1105delG-sort.bam.bai > /cromwell_root/bcbiotx/tmp0vNekY/P09_S06_BRCA1_c1105delG-sort.bam
[V] 0 01:08:27887400 MemUsage(size=803.516,rss=7.28516,peak=803.586) AutoArrayMemUsage(memusage=593.073,peakmemusage=593.073,maxmem=1.75922e+13) final
[V] flushing read ends lists...done.
[V] merging read ends lists/computing duplicates...done, time 01:05953300
[V] num dups 0
# bamsormadup
##METRICS
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
[V] blocks generated in time 01:10:63775300
[V] number of blocks to be merged is 1 using 8192 blocks per input with block size 1048576
[V] 0
[D] md5 3a41b8e423502cae9ef5bf4d03d77f96
[V] checksum ok
[V] blocks merged in time 01:06085999
[V] run time 01:11:70538999 (71.7054 s) MemUsage(size=8494.72,rss=59.1562,peak=9518.73)
/bin/bash: line 1: 74 Killed /usr/local/share/bcbio-nextgen/anaconda/bin/bwa mem -c 250 -M -t 2 -R '@RG\tID:P09_S06_BRCA1_c1105delG\tPL:illumina\tPU:P09_S06_BRCA1_c1105delG\tSM:P09_S06_BRCA1_c1105delG' -v 1 /cromwell_root/bcbiodata/collections/hg38/bwa/hg38.fa /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/bf746f5a-66f4-4960-b9be-6b478fc6958c/call-alignment/shard-0/wf-alignment.cwl/609851bd-1855-435b-8294-85c11776a709/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_1.fastq.gz /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/bf746f5a-66f4-4960-b9be-6b478fc6958c/call-alignment/shard-0/wf-alignment.cwl/609851bd-1855-435b-8294-85c11776a709/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_2.fastq.gz
75 Done | /usr/local/share/bcbio-nextgen/anaconda/bin/bamsormadup inputformat=sam threads=2 tmpfile=/cromwell_root/bcbiotx/tmp0vNekY/P09_S06_BRCA1_c1105delG-sort-sorttmp-markdup SO=coordinate indexfilename=/cromwell_root/bcbiotx/tmp0vNekY/P09_S06_BRCA1_c1105delG-sort.bam.bai > /cromwell_root/bcbiotx/tmp0vNekY/P09_S06_BRCA1_c1105delG-sort.bam
' returned non-zero exit status 137
...
Anything obvious in the above?
EDIT: Hmm it seems to refer to hg38 although I'm pretty sure I chose GRCh37 (and tried to point to your public copy of it). Will look again
EDIT2: Nope, made a mistake myself, rerunning..
My test run on GRCh37 ran to completion I think. It seems like every time the pipeline goes down it produces this error (even if the run was successful):
[2019-01-30 04:40:27,41] [info] ServiceRegistryActor stopped
[2019-01-30 04:40:27,46] [info] Database closed
[2019-01-30 04:40:27,46] [info] Stream materializer shut down
[2019-01-30 04:40:27,47] [info] WDL HTTP import resolver closed
/bin/sh: 1: docker: not found
Traceback (most recent call last):
File "/home/miika/install/bcbio-vm/anaconda/bin/bcbio_vm.py", line 354, in <module>
args.func(args)
File "/home/miika/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio/cwl/tool.py", line 312, in run
_TOOLS[args.tool](args)
File "/home/miika/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio/cwl/tool.py", line 186, in _run_cromwell
_run_tool(cmd, not args.no_container, work_dir, log_file)
File "/home/miika/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio/cwl/tool.py", line 50, in _run_tool
_chown_workdir(work_dir)
File "/home/miika/install/bcbio-vm/anaconda/lib/python2.7/site-packages/bcbio/cwl/tool.py", line 67, in _chown_workdir
subprocess.check_call(cmd, shell=True)
File "/home/miika/install/bcbio-vm/anaconda/lib/python2.7/subprocess.py", line 190, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'docker run --rm -v /home/miika/install/cromwell_work:/home/miika/install/cromwell_work quay.io/bcbio/bcbio-base /bin/bash -c 'chown -R 1003 /home/miika/install/cromwell_work'' returned non-zero exit status 127
I also noticed that on Google storage the resulting folder has links to root:
gs://snp-calling-project/work_cromwell/main-test_run.cwl/f23942fc-3ef3-499d-ac9b-024de539f92a/call-alignment_to_rec/gs://
This makes copying the files over a bit more difficult as recursive copying now copies everything from the root folder on. I haven't fully checked if other folders have this issue.
Edit: Also to the reference data location:
gs://snp-calling-project/work_cromwell/main-test_run.cwl/f23942fc-3ef3-499d-ac9b-024de539f92a/call-batch_for_variantcall/gs://bcbiodata/collections/GRCh37/rtg--GRCh37.sdf-wf.tar.gz
Edit2: By the looks of it I should only focus on these folders (which shouldn't have gs://
links)
call-process_alignment
call-multiqc_summary
call-postprocess_alignment
call-summarize_vc
Miika; Thanks so much for working through this. It sounds like you've made great progress and I appreciate all the feedback. I've been improving the documentation based on this and also uploaded the hg38 genome alongside GRCh37 so you could use that for your tests:
$ gsutil ls gs://bcbiodata/collections/hg38/
gs://bcbiodata/collections/hg38/rtg--hg38.sdf-wf.tar.gz
gs://bcbiodata/collections/hg38/snpeff--GRCh38.86-wf.tar.gz
gs://bcbiodata/collections/hg38/versions.csv
gs://bcbiodata/collections/hg38/bwa/
gs://bcbiodata/collections/hg38/config/
gs://bcbiodata/collections/hg38/coverage/
gs://bcbiodata/collections/hg38/rnaseq/
gs://bcbiodata/collections/hg38/seq/
gs://bcbiodata/collections/hg38/ucsc/
gs://bcbiodata/collections/hg38/validation/
gs://bcbiodata/collections/hg38/variation/
gs://bcbiodata/collections/hg38/viral/
Thanks also for the heads up on the local docker problem. I pushed a fix for that and will build a new bcbio conda package, but for now you can just ignore. That's the last step inside of bcbio to try and clean things up but shouldn't fail if you don't have a local docker.
For the folder issues, I don't think you want to copy everything from those work directories as that contains everything that got staged during running. We do need a clean way to copy just the final outputs into a separate directory as I don't think Cromwell does that by default. I'll ask what best practices are with the Cromwell and work on incorporating this.
Thank you again for all this feedback and progress.
Awesome, cheers Brad. Will I be able to inject my own Cosmic file to hg38 by having it in e.g. in my yaml template as
variation:
cosmic: gs://snp-calling-project/biodata/cosmic.vcf.gz
I presume this is the only thing I need in order to be able to set vcfanno: somatic
?
I made my own copy of your hg38 bucket and added the cosmic vcf. However I noticed that there is a check here: https://github.com/bcbio/bcbio-nextgen/blob/b14ee005c335f1e86162f2d203f591e3932f100d/bcbio/variation/vcfanno.py#L152 Looks like this is only for paired variant calling? The warning message is somewhat misleading:
[2019-01-30T13:22Z] WARNING: Skipping vcfanno configuration: somatic. Not all input files found.
Edit: I suppose this is a separate setting?
tools_on: [tumoronly_germline_filter]
Miika;
Thanks for working on this. The paired
in that case just means that the sample is a somatic run, either tumor only or tumor/normal. Does the sample you want to annotate with vcfanno
have phenotype: tumor
in the metadata? If you could share your configuration we might be able to spot something else if that's not it. Thanks again.
Ahh, I see. Here's my metadata:
samplename,description,batch,phenotype
P09_S06_BRCA1_c1105delG_1.fastq.gz;P09_S06_BRCA1_c1105delG_2.fastq.gz,P09_S06_BRCA1_c1105delG,P09_S06_BRCA1_c1105delG-batch,tumor
P09_S12_BRCA2_c100024GA_1.fastq.gz;P09_S12_BRCA2_c100024GA_2.fastq.gz,P09_S12_BRCA2_c100024GA,P09_S12_BRCA2_c100024GA-batch,tumor
Hi Brad, I think I identified the issue. It's here:
https://github.com/bcbio/bcbio-nextgen/blob/b14ee005c335f1e86162f2d203f591e3932f100d/bcbio/variation/vcfanno.py#L154
The os.path.exists
won't work for google buckets I suspect
I'll hack it away for my testing purposes.
Edit: OK so if I remove the os.path.exists check then I get to another problem:
[2019-01-30T14:14Z] WARNING: The vcfanno configuration /home/miika/install/gs:/snp-calling-project/biodata/hg38/config/vcfanno/somatic.conf was not found for hg38, skipping.
I could try skipping this check too but not sure things will work down the line
Edit2: It looks like the function find_annotations
uses os.path.abspath so limited to local runs for now?
Tried running the same data on hg38:
Traceback (most recent call last):
File "/usr/local/bin/bcbio_nextgen.py", line 223, in <module>
runfn.process(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 57, in process
out = fn(*fnargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 54, in wrapper
return f(*args, **kwargs)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 119, in process_alignment
return sample.process_alignment(*args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 128, in process_alignment
data = align_to_sort_bam(fastq1, fastq2, aligner, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 83, in align_to_sort_bam
names, align_dir, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 158, in _align_from_fastq
out = align_fn(fastq1, fastq2, align_ref, names, align_dir, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 170, in align_pipe
names, rg_info, data)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 181, in _align_mem
[do.file_nonempty(tx_out_file), do.file_reasonable_size(tx_out_file, fastq_file)])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 26, in run
_do_run(cmd, checks, log_stdout, env=env)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && /usr/local/share/bcbio-nextgen/anaconda/bin/bwa mem -c 250 -M -t 2 -R '@RG\tID:P09_S06_BRCA1_c1105delG\tPL:illumina\tPU:P09_S06_BRCA1_c1105delG\tSM:P09_S06_BRCA1_c1105delG' -v 1 /cromwell_root/snp-calling-project/biodata/hg38/bwa/hg38.fa /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/5d6574d6-e53e-4b33-a85b-4e3d351537ee/call-alignment/shard-0/wf-alignment.cwl/fc8627a5-0e2d-4880-8618-d73f3ebf31f2/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_1.fastq.gz /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/5d6574d6-e53e-4b33-a85b-4e3d351537ee/call-alignment/shard-0/wf-alignment.cwl/fc8627a5-0e2d-4880-8618-d73f3ebf31f2/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_2.fastq.gz | /usr/local/share/bcbio-nextgen/anaconda/bin/bamsormadup inputformat=sam threads=2 tmpfile=/cromwell_root/bcbiotx/tmp23MM9G/P09_S06_BRCA1_c1105delG-sort-sorttmp-markdup SO=coordinate indexfilename=/cromwell_root/bcbiotx/tmp23MM9G/P09_S06_BRCA1_c1105delG-sort.bam.bai > /cromwell_root/bcbiotx/tmp23MM9G/P09_S06_BRCA1_c1105delG-sort.bam
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (98, 151, 523)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1373)
[M::mem_pestat] mean and std.dev: (275.08, 267.99)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1798)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (112, 129, 148)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (40, 220)
[M::mem_pestat] mean and std.dev: (130.17, 28.59)
[M::mem_pestat] low and high boundaries for proper pairs: (4, 256)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (164, 340, 814)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2114)
[M::mem_pestat] mean and std.dev: (551.98, 580.06)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2872)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (170, 313, 706)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1778)
[M::mem_pestat] mean and std.dev: (439.51, 398.04)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2314)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[V] 0 14:18:19529299 MemUsage(size=806.055,rss=20.375,peak=806.934) AutoArrayMemUsage(memusage=594.325,peakmemusage=594.325,maxmem=1.75922e+13) final
[V] flushing read ends lists...done.
[V] merging read ends lists/computing duplicates...done, time 01:01644399
[V] num dups 0
# bamsormadup
##METRICS
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
[V] blocks generated in time 14:20:72166700
[V] number of blocks to be merged is 1 using 8192 blocks per input with block size 1048576
[V] 0
[D] md5 70221f140b7d373d2b5ccea6b62d9781
[V] checksum ok
[V] blocks merged in time 01:07780699
[V] run time 14:21:84713799 (861.847 s) MemUsage(size=238.41,rss=40.8047,peak=9457.16)
/bin/bash: line 1: 74 Killed /usr/local/share/bcbio-nextgen/anaconda/bin/bwa mem -c 250 -M -t 2 -R '@RG\tID:P09_S06_BRCA1_c1105delG\tPL:illumina\tPU:P09_S06_BRCA1_c1105delG\tSM:P09_S06_BRCA1_c1105delG' -v 1 /cromwell_root/snp-calling-project/biodata/hg38/bwa/hg38.fa /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/5d6574d6-e53e-4b33-a85b-4e3d351537ee/call-alignment/shard-0/wf-alignment.cwl/fc8627a5-0e2d-4880-8618-d73f3ebf31f2/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_1.fastq.gz /cromwell_root/snp-calling-project/work_cromwell/main-test_run.cwl/5d6574d6-e53e-4b33-a85b-4e3d351537ee/call-alignment/shard-0/wf-alignment.cwl/fc8627a5-0e2d-4880-8618-d73f3ebf31f2/call-prep_align_inputs/align_prep/P09_S06_BRCA1_c1105delG_2.fastq.gz
75 Done | /usr/local/share/bcbio-nextgen/anaconda/bin/bamsormadup inputformat=sam threads=2 tmpfile=/cromwell_root/bcbiotx/tmp23MM9G/P09_S06_BRCA1_c1105delG-sort-sorttmp-markdup SO=coordinate indexfilename=/cromwell_root/bcbiotx/tmp23MM9G/P09_S06_BRCA1_c1105delG-sort.bam.bai > /cromwell_root/bcbiotx/tmp23MM9G/P09_S06_BRCA1_c1105delG-sort.bam
' returned non-zero exit status 137
Could this be a memory issue potentially? The fastq files are tiny (just a test case)
Miika; Thanks for the testing. I'm agreed with your assessment on vcfanno setup; I'll need to refactor this to consider non-local files. I'll work on that and ping here when fixed.
For your run, it looks like the process is getting killed, likely due to using too much memory. It looks like you're only using 2 cores for bwa so maybe just have very minimal core/memory requirements in your bcbio_system.yaml
. If so, you're probably getting a tiny machine that can't handle loading the hg38 reference into memory. Adding more cores to bcbio_system.yaml should hopefully fix this.
Thanks Brad, I did a local test on the same data and it went fine.
My bcbio_system is verbatim from the docs:
gs:
ref: gs://snp-calling-project/biodata # gs://bcbiodata/collections
inputs:
- gs://snp-calling-project/inputs/
resources:
default: {cores: 2, memory: 3G, jvm_opts: [-Xms750m, -Xmx3000m]}
So yes I'll request something with bigger memory (my bad).
Edit: Yes, bumping up to 10G made the run finish. Whenever you get the vcfanno stuff in place I'll happily test it
Miika; Thanks for all the testing and glad things are working with the test runs. The latest version of bcbio-vm should now handle generating vcfanno configurations when all the data is in remote locations. If you update with:
bcbiovm_conda install -c conda-forge -c bioconda -y bcbio-nextgen bcbio-nextgen-vm
and then regenerate the CWL you should see the vcfanno config files in the input JSON file (your-workflow/main-your-samples.json
) and it should now run these as part of the workflow. Let me know if you hit any issues and happy to work more on this. Thanks again.
Perfect, ran to completion fine
Hi Brad, I hope you're well - long time no see.
I was following https://bcbio-nextgen.readthedocs.io/en/latest/contents/cloud.html#docs-cloud-gcp for testing bcbio_vm on GCP. I did the minimal bcbio_vm setup and uploaded my data to a bucket:
My
bcbio_system-gcp.yaml
looks like this:When I run templating I get a warning about the samples not being there:
Any ideas why it's not seeing the samples?
Further, if I wanted to use hg38, I can see the bucket
gsutil ls gs://bcbiodata/collections/
only has GRCh37. If I specify hg38 in my yaml am I correct in assuming hg38 gets pulled from somewhere? Any plans on adding it to the public bucket?Lastly, what's the best mechanism for injecting a Cosmic vcf into my biodata and I presume UMI deduping works within CWL?
Cheers, Miika