ChrisMaherLab / PACT

9 stars 4 forks source link

Run sv caller with the 'Fail to open index fo example.sample.bam' #10

Open LiuH2020 opened 1 year ago

LiuH2020 commented 1 year ago

Hello, Thank you for the great pipeline! Now, I try to run the pipeline to call SV. I get an error when trying to run sv_pipeline.cwl, the error as below:

cwltool /home/liuhui/ctDNA-pipeline1/PACT/pipelines/sv_pipeline.cwl /home/liuhui/ctDNApipeline1/PACT/example_ymls/sv_example.yml
INFO /home/liuhui/.conda/envs/PACT/bin/cwltool 3.1.20230906142556
INFO Resolved '/home/liuhui/ctDNA-pipeline1/PACT/pipelines/sv_pipeline.cwl' to 'file:///home/liuhui/ctDNA-pipeline1/PACT/pipelines/sv_pipeline.cwl'
WARNING Workflow checker warning:
../PACT/subworkflows/sv_merge_and_filter.cwl:22:3: Source 'max_distance_to_merge' of type ["null",
                                                   "int"] may be incompatible
../PACT/subworkflows/sv_merge_and_filter.cwl:73:4:   with sink 'max_distance_to_merge' of type
                                                     "int"
../PACT/subworkflows/sv_merge_and_filter.cwl:25:3: Source 'minimum_sv_calls' of type ["null",
                                                   "int"] may be incompatible
../PACT/subworkflows/sv_merge_and_filter.cwl:74:4:   with sink 'minimum_sv_calls' of type "int"
../PACT/subworkflows/sv_merge_and_filter.cwl:28:3: Source 'minimum_sv_size' of type ["null", "int"]
                                                   may be incompatible
../PACT/subworkflows/sv_merge_and_filter.cwl:79:4:   with sink 'minimum_sv_size' of type "int"
../PACT/subworkflows/sv_merge_and_filter.cwl:31:3: Source 'same_strand' of type ["null", "boolean"]
                                                   may be incompatible
../PACT/subworkflows/sv_merge_and_filter.cwl:76:4:   with sink 'same_strand' of type "boolean"
../PACT/subworkflows/sv_merge_and_filter.cwl:34:3: Source 'same_type' of type ["null", "boolean"]                                                   may be incompatible
../PACT/subworkflows/sv_merge_and_filter.cwl:75:4:   with sink 'same_type' of type "boolean"
WARNING ../PACT/tools/three_way_merge.cwl:27:1: JSHINT:   inner = [inputs.array1[i], inputs.array2[i], inputs.array3[i]];
../PACT/tools/three_way_merge.cwl:27:1: JSHINT:   ^
../PACT/tools/three_way_merge.cwl:27:1: JSHINT: W117: 'inner' is not defined.
WARNING ../PACT/tools/three_way_merge.cwl:27:1: JSHINT:   out_array.push(inner);
../PACT/tools/three_way_merge.cwl:27:1: JSHINT:                  ^
../PACT/tools/three_way_merge.cwl:27:1: JSHINT: W117: 'inner' is not defined.
WARNING Workflow checker warning:
../PACT/pipelines/sv_pipeline.cwl:80:3: Source 'minwt' of type ["null", "int"] may be incompatible
../PACT/pipelines/sv_pipeline.cwl:97:4:   with sink 'minwt' of type "int"
INFO [workflow ] start
INFO [workflow ] starting step sv_calling
INFO [step sv_calling] start
INFO [workflow sv_calling] start
INFO [workflow sv_calling] starting step delly_calls
INFO [step delly_calls] start
WARNING [job delly_calls] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
WARNING [job delly_calls] Skipping Docker software container '--cpus' limit despite presence of ResourceRequirement with coresMin and/or coresMax setting. Consider running with --strict-cpu-limit for increased portability assurance.
INFO [job delly_calls] /tmp/pu5gcow9$ docker \
    run \
    -i \
    --mount=type=bind,source=/tmp/pu5gcow9,target=/LpZJVZ \
    --mount=type=bind,source=/tmp/h2rzsz6g,target=/tmp \
    --mount=type=bind,source=/home/liuhui/ctDNA-pipeline1/PACT/example_data/example.matchedControl.bam,target=/var/lib/cwl/stg1ed9e023-4d01-4df9-9919-a460111bd046/example.matchedControl.bam,readonly \
    --mount=type=bind,source=/home/liuhui/ctDNA-pipeline1/PACT/example_data/example.matchedControl.bam.bai,target=/var/lib/cwl/stg1ed9e023-4d01-4df9-9919-a460111bd046/example.matchedControl.bam.bai,readonly \
    --mount=type=bind,source=/home/liuhui/DataFile/genome/GCF_000001405.25_GRCh37.p13_genomic.fa,target=/var/lib/cwl/stgdeaacacf-533b-4281-af66-a024ba138322/GCF_000001405.25_GRCh37.p13_genomic.fa,readonly \
    --mount=type=bind,source=/home/liuhui/DataFile/genome/GCF_000001405.25_GRCh37.p13_genomic.fa.fai,target=/var/lib/cwl/stgdeaacacf-533b-4281-af66-a024ba138322/GCF_000001405.25_GRCh37.p13_genomic.fa.fai,readonly \
    --mount=type=bind,source=/home/liuhui/DataFile/genome/GCF_000001405.25_GRCh37.p13_genomic.dict,target=/var/lib/cwl/stgdeaacacf-533b-4281-af66-a024ba138322/GCF_000001405.25_GRCh37.p13_genomic.dict,readonly \
    --mount=type=bind,source=/home/liuhui/ctDNA-pipeline1/PACT/example_data/example.sample.bam,target=/var/lib/cwl/stg4d961a0b-0f57-48e3-a129-a3d2a253045c/example.sample.bam,readonly \
    --mount=type=bind,source=/home/liuhui/ctDNA-pipeline1/PACT/example_data/example.sample.bam.bai,target=/var/lib/cwl/stg4d961a0b-0f57-48e3-a129-a3d2a253045c/example.sample.bam.bai,readonly \
    --workdir=/LpZJVZ \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --cidfile=/tmp/0egdnj09/20231011013404-157380.cid \
    --env=TMPDIR=/tmp \
    --env=HOME=/LpZJVZ \
    jbwebster/pipeline_docker \
    delly \
    call \
    -g \
    /var/lib/cwl/stgdeaacacf-533b-4281-af66-a024ba138322/GCF_000001405.25_GRCh37.p13_genomic.fa \    -o \
    /LpZJVZ/example.sample.bcf \
    /var/lib/cwl/stg4d961a0b-0f57-48e3-a129-a3d2a253045c/example.sample.bam \
    /var/lib/cwl/stg1ed9e023-4d01-4df9-9919-a460111bd046/example.matchedControl.bam
Fail to open index for /var/lib/cwl/stg4d961a0b-0f57-48e3-a129-a3d2a253045c/example.sample.bam
WARNING [job delly_calls] exited with status: 1
ERROR [job delly_calls] Job error:
("Error collecting output for parameter 'delly_output': ../PACT/tools/delly_caller.cwl:47:4: Did not find output file with glob pattern: ['example.sample.bcf'].", {})
WARNING [job delly_calls] completed permanentFail
WARNING [step delly_calls] completed permanentFail
INFO [workflow sv_calling] completed permanentFail
WARNING [step sv_calling] completed permanentFail
INFO [workflow ] completed permanentFail
{
    "somatic_svs_bedpe": null
}WARNING Final process status is permanentFail

my yml file like this

# For use with pipelines/sv_pipeline.cwl
# Reference should have .dict and .fai files in same directory
reference:
 class: File
 path: /home/liuhui/DataFile/genome/GCF_000001405.25_GRCh37.p13_genomic.fa
ref_genome: GRCh37
# snpEff database. These can be downloaded using java -jar snpEff.jar download <database>.
# Should correspond to reference genome
snpEff_data:
 class: Directory
 path: /home/liuhui/.conda/envs/PACT/share/snpeff-5.1-2/data/GRCh37.p13
# Paths to cfDNA samples
sample_bams:
 - {class: File, path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/example.sample.bam}
# Paths to matched control samples (ex: plasma depleted whole blood)
# Should be in same order as sample_bams
matched_control_bams:
 - {class: File, path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/example.matchedControl.bam}
# Paths to bams that make up the panel of normals.
panel_of_normal_bams:
 - {class: File, path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/example.healthy.bam}
# Standard bed file of targeted regions during sequencing
target_regions:
 class: File
 path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/targetRegions.bed
# Neither breakend of SVs should fall in the blacklisted regions in this bed file
# We recommend the blacklist regions provided by 10xgenomics. Their hg19 bed file is at
# http://cf.10xgenomics.com/supp/genome/hg19/sv_blacklist.bed
neither_region:
 class: File
 path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/hg19.longranger-blacklist.bed
# A maximum of one breakend for SVs may fall in the regions in this bed file
# We recommend Heng Li's low complexity regions found here
# https://github.com/lh3/varcmp/raw/master/scripts
notboth_region:
 class: File
 path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/hg19.LCR.bed

It seems to be casused by the Fail to open index fo example.sample.bam, and maybe the bai file is damaged. I can't reindex example.sample.bam for samtools index example.sample.bam with error.

 samtools index /home/liuhui/ctDNA-pipeline1/PACT/example_data/example.sample.bam
samtools index: "/home/liuhui/ctDNA-pipeline1/PACT/example_data/example.sample.bam" is in a format that cannot be usefully indexed

You can give me some advise? Thank you very much.

LiuH2020 commented 1 year ago

Hello,The error above has been resolved with using other data (The error should be casused by the damaged bam or bai file).Now, I run the lumpy_calls step with some error:

INFO [job gunzip_manta] completed success
INFO [step gunzip_manta] completed success
INFO [workflow sv_calling] starting step lumpy_calls
INFO [step lumpy_calls] start
ERROR Exception on step 'lumpy_calls'
ERROR Cannot make scatter job: Missing required secondary file 'sample.discordant.bam.bai' from file object: {
    "location": "file:///tmp/iyj1x0ux/sample.discordant.bam",
    "basename": "sample.discordant.bam",
    "nameroot": "sample.discordant",
    "nameext": ".bam",
    "class": "File",
    "checksum": "sha1$1c2d7d73d12173a785089940ce7b4b5148ac9231",
    "size": 11082194,
    "http://commonwl.org/cwltool#generation": 0,
    "secondaryFiles": []
}
WARNING [step lumpy_calls] completed permanentFail
INFO [workflow sv_calling] completed permanentFail
WARNING [step sv_calling] completed permanentFail
INFO [workflow ] completed permanentFail
{
    "somatic_svs_bedpe": null
}WARNING Final process status is permanentFail

And my code and yml file like this

#code
cwltool /home/liuhui/ctDNA-pipeline1/PACT/pipelines/sv_pipeline.cwl sv_example.yml
#yml file
cat sv_example.yml
# For use with pipelines/sv_pipeline.cwl
# Reference should have .dict and .fai files in same directory
reference:
 class: File
 path: /home/liuhui/cfDNA-pipeline/hg19_bowtie2/hg19.fa
ref_genome: hg19
# snpEff database. These can be downloaded using java -jar snpEff.jar download <database>.
# Should correspond to reference genome
snpEff_data:
 class: Directory
 path: /home/liuhui/.conda/envs/PACT/share/snpeff-5.1-2/data/GRCh37.p13
# Paths to cfDNA samples
sample_bams:
 - {class: File, path: /home/liuhui/ctDNA-pipeline1/PACT-test/bwa-map/case1.sort.duplicate.bam}
# Paths to matched control samples (ex: plasma depleted whole blood)
# Should be in same order as sample_bams
matched_control_bams:
 - {class: File, path: /home/liuhui/ctDNA-pipeline1/PACT-test/bwa-map/ctrl1.sort.duplicate.bam}
# Paths to bams that make up the panel of normals.
panel_of_normal_bams:
 - {class: File, path: /home/liuhui/ctDNA-pipeline1/PACT-test/bwa-map/ctrl1.sort.duplicate.bam}
# Standard bed file of targeted regions during sequencing
target_regions:
 class: File
 path: /home/liuhui/ctDNA-pipeline1/PACT-test/run_PACT/targetRegions1.bed
# Neither breakend of SVs should fall in the blacklisted regions in this bed file
# We recommend the blacklist regions provided by 10xgenomics. Their hg19 bed file is at
# http://cf.10xgenomics.com/supp/genome/hg19/sv_blacklist.bed
neither_region:
 class: File
 path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/hg19.longranger-blacklist.bed
# A maximum of one breakend for SVs may fall in the regions in this bed file
# We recommend Heng Li's low complexity regions found here
# https://github.com/lh3/varcmp/raw/master/scripts
notboth_region:
 class: File
 path: /home/liuhui/ctDNA-pipeline1/PACT/example_data/hg19.LCR.bed

I'm not very familiar with the CWL workflow, and could give me some advise for the error? Thank you very much.