XWangLabTHU / cfDNApipe

cfDNApipe: A comprehensive quality control and analysis pipeline for cell-free DNA high-throughput sequencing data
https://xwanglabthu.github.io/cfDNApipe/
Other
61 stars 31 forks source link

multi-core error at adapter removal #10

Open egenomics opened 2 years ago

egenomics commented 2 years ago

Hi, I am getting an error related with multi-core usage. I am executing cfdnapipe in a slurm cluster in a job with 24 cores available. A fraction of the log containing the error (happens for most files).

An Error Occured During The Following Command Line Executing.
^^^
AdapterRemoval --threads 24 --file1 /well/buck/users/xhs232/data_hcb/longwood_fastqs/DL99908_hu_S5_R1_001.fastq.gz --file2 /well/buck/users/xhs232/data_hcb/longwood_fastqs/DL99908_hu_S5_R2_001.fastq.gz --adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCCACTGAAAAAAAAAATCTCGTATGCCGTCTTCTGCTTGAAAAATGGGGG --adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTACGCACCTGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAGGGGGGGGGG --basename /well/buck/users/xhs232/analysis/cfdna_fragmentomics_cluster_24cores/intermediate_result/step_03_adapterremoval/DL99908huS5 --qualitybase 33 --gzip^^^
Trimming paired end reads ...
Opening FASTQ file '/well/buck/users/xhs232/data_hcb/longwood_fastqs/DL99908_hu_S5_R1_001.fastq.gz', line numbers start at 1
Opening FASTQ file '/well/buck/users/xhs232/data_hcb/longwood_fastqs/DL99908_hu_S5_R2_001.fastq.gz', line numbers start at 1
ERROR: Unhandled exception in thread:
    basic_ios::clear: iostream error
ERROR: AdapterRemoval did not run to completion;
       do NOT make use of resulting trimmed reads!
^^^

         Please Stop The Program To Check The Error.         

^^^
Traceback (most recent call last):
  File "/well/buck/users/xhs232/analysis/cfdna_fragmentomics_cluster_24cores/scripts/cfdna_pipe_run.py", line 23, in <module>
    report=True,
  File "/well/buck/users/xhs232/conda/envs/cfDNApipe/lib/python3.6/site-packages/cfDNApipe/Pipeline.py", line 148, in cfDNAWGS
    res_adapterremoval = adapterremoval(upstream=res_identifyAdapter, other_params=rmAdOP, verbose=verbose)
  File "/well/buck/users/xhs232/conda/envs/cfDNApipe/lib/python3.6/site-packages/cfDNApipe/Fun_adapterremoval.py", line 248, in __init__
    self.multiRun(args=all_cmd, func=None, nCore=1)
  File "/well/buck/users/xhs232/conda/envs/cfDNApipe/lib/python3.6/site-packages/cfDNApipe/StepBase.py", line 672, in multiRun
    raise commonError("Error occured in multi-core running!")
cfDNApipe.cfDNA_utils.commonError: Error occured in multi-core running!

The actual code that I executed is this one:

from cfDNApipe import *

pipeConfigure(
    threads=24,
    genome="hg38",
    refdir=r"/well/buck/users/xhs232/references/cfdnapipe",
    outdir=r"/well/buck/users/xhs232/analysis/cfdna_fragmentomics_cluster_24cores",
    data="WGS",
    type="paired",
    build=True,
    JavaMem="10g",
)

res = cfDNAWGS(
    inputFolder=r"/well/buck/users/xhs232/data_hcb/longwood_fastqs",
    idAdapter=True,
    rmAdapter=True,
    dudup=True,
    CNV=True,
    armCNV=True,
    fragProfile=True,
    verbose=False,
    report=True,
)

Configure.snvRefCheck(folder="/well/buck/users/xhs232/references/cfdnapipe/hg38", build=True)

# Using bam files directly.
# Of course, the "upstream" of addRG can be from "rmduplicate".
res1 = addRG(upstream=res.rmduplicate)

res2 = BaseRecalibrator(upstream=res1, knownSitesDir=Configure.getConfig("snv.folder"))
res3 = BQSR(upstream=res2)
res4 = getPileup(upstream=res3, biallelicvcfInput=Configure.getConfig("snv.ref")["7"],)
res5 = contamination(upstream=res4)

res6 = mutect2t(
    caseupstream=res5, vcfInput=Configure.getConfig("snv.ref")["6"], ponbedInput=Configure.getConfig("snv.ref")["8"],
)

res7 = filterMutectCalls(upstream=res6)

# ???
res8 = gatherVCF(upstream=res7)

# split somatic mutations
res9 = bcftoolsVCF(upstream=res8, stepNum="somatic")

# split germline mutations
res10 = bcftoolsVCF(upstream=res8, other_params={"-f": "'germline'"}, suffix="germline", stepNum="germline")