bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

bcbio stalls #2185

Closed alongalor closed 6 years ago

alongalor commented 6 years ago

Hello!

Let me start by saying thank you for all your hard work on this incredibly important project! This is a fantastic initiative.

I am using bcbio for the first time and am using the tumor-paired template. I have encountered some issues I would greatly appreciate any help with.

Upon executing the following command:

bcbio_nextgen.py /n/data1/hms/dbmi/park/alon/software/bcbio/project1/config/project1.yaml -t ipython -n 64 -s lsf -q park_long

things run smoothly for a few minutes and then no progress is made (for many hours).

the bcbio-nextgen-debug.log reads as follows:

[2017-12-15T20:22Z] ottavino000-249: System YAML configuration: /n/data1/hms/dbmi/park/alon/software/bcbio/share/bcbio-nextgen/galaxy/bcbio_system.yaml
[2017-12-15T20:22Z] ottavino000-249: Resource query function not implemented for scheduler "lsf"; submitting job to queue
[2017-12-15T20:22Z] ottavino000-249: Resource requests: ; memory: 1.00; cores: 1
[2017-12-15T20:22Z] ottavino000-249: Configuring 1 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
[2017-12-15T20:23Z] ottavino000-249: ipython: machine_info
[2017-12-15T20:23Z] ottavino000-249: Resource requests: bwa, sambamba, samtools; memory: 3.00, 3.00, 3.00; cores: 16, 16, 16
[2017-12-15T20:23Z] ottavino000-249: Configuring 4 jobs to run, using 16 cores each with 48.1g of memory reserved for each job
[2017-12-15T20:24Z] ottavino000-249: Timing: organize samples
[2017-12-15T20:24Z] ottavino000-249: ipython: organize_samples
[2017-12-15T20:24Z] ottavino000-242: Using input YAML configuration: /n/data1/hms/dbmi/park/alon/software/bcbio/project1/config/project1.yaml
[2017-12-15T20:24Z] ottavino000-242: Checking sample YAML configuration: /n/data1/hms/dbmi/park/alon/software/bcbio/project1/config/project1.yaml
[2017-12-15T20:24Z] ottavino000-242: Testing minimum versions of installed programs
[2017-12-15T20:24Z] ottavino000-249: Timing: alignment preparation
[2017-12-15T20:24Z] ottavino000-249: ipython: prep_align_inputs
[2017-12-15T20:24Z] ottavino000-233: bgzip input file
[2017-12-15T20:24Z] ottavino000-233: bgzip input file
[2017-12-15T20:24Z] oboe000-253: Skipping trimming of A10_S10_L008.
[2017-12-15T20:24Z] oboe000-253: Resource requests: ; memory: 1.00; cores: 1
[2017-12-15T20:24Z] oboe000-253: Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
[2017-12-15T20:24Z] oboe000-253: Index input with grabix: A10_S10_L008_A10_S10_L008_R2_001.fastq.gz
[2017-12-15T20:24Z] oboe000-253: Index input with grabix: A10_S10_L008_A10_S10_L008_R1_001.fastq.gz

and the bcbio-ipengine.bsub.%1157074 file reads as follows:

2017-12-15 15:24:28.298 [IPEngineApp] Using existing profile dir: u'/n/data1/hms/dbmi/park/alon/software/bcbio/project1/work/log/ipython'
2017-12-15 15:24:28.324 [IPEngineApp] Loading url_file u'/n/data1/hms/dbmi/park/alon/software/bcbio/project1/work/log/ipython/security/ipcontroller-69791d9a-f76f-4e8d-bbb3-e60d2181b00a-engine.json'
2017-12-15 15:24:28.350 [IPEngineApp] Registering with controller at tcp://10.0.65.35:58868
2017-12-15 15:24:28.453 [IPEngineApp] Starting to monitor the heartbeat signal from the hub every 5010 ms.
2017-12-15 15:24:28.459 [IPEngineApp] Using existing profile dir: u'/n/data1/hms/dbmi/park/alon/software/bcbio/project1/work/log/ipython'
2017-12-15 15:24:28.465 [IPEngineApp] Completed registration with id 3
2017-12-15 15:24:46.969 [IPEngineApp] WARNING | apply_request is deprecated in kernel_base, moving to ipyparallel.
2017-12-15 15:24:46.895 [IPEngineApp] WARNING | No heartbeat in the last 5010 ms (1 time(s) in a row).

The commands I used are as follows:

# Installation
mkdir /n/data1/hms/dbmi/park/alon/software/bcbio
cd /n/data1/hms/dbmi/park/alon/software/bcbio
wget https://raw.github.com/chapmanb/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python ./bcbio_nextgen_install.py /n/data1/hms/dbmi/park/alon/software/bcbio/share/bcbio-nextgen --tooldir=/n/data1/hms/dbmi/park/alon/software/bcbio --genomes GRCh37 --aligners bwa
gatk-register /n/data1/hms/dbmi/park/alon/software/gatk/GenomeAnalysisTK-3.8-0.tar.bz2 
bcbio_setup_genome.py -f /n/data1/hms/dbmi/park/alon/software/gatk/human_g1k_v37_decoy.fasta -i bwa -n hs37d5 -b hs37d5

# Tumor-paired pipeline
bcbio_nextgen.py -w template tumor-paired project1
# manually edit template file `/n/data1/hms/dbmi/park/alon/software/bcbio/project1/config/project1-template.yaml`, change mutect to mutect2 and GRCh37 to hs37d5 
nano /n/data1/hms/dbmi/park/alon/software/bcbio/project1.csv
# enter the following:
samplename,batch,phenotype,sex
A10_S10_L008,batch1,tumor,male
fibroblasts_sorted,batch1,normal,male

bcbio_nextgen.py -w template /n/data1/hms/dbmi/park/alon/software/bcbio/project1/config/project1-template.yaml /n/data1/hms/dbmi/park/alon/software/bcbio/project1.csv /n/data1/hms/dbmi/park/DATA/BSMCommonExperiment/ReferenceTissueProject/sourceData/fastq/Vaccarino/A10_S10_L008_R1_001.fastq.gz /n/data1/hms/dbmi/park/DATA/BSMCommonExperiment/ReferenceTissueProject/sourceData/fastq/Vaccarino/A10_S10_L008_R2_001.fastq.gz /n/data1/hms/dbmi/park/DATA/BSMCommonExperiment/ReferenceTissueProject/BamToFastq/Vaccarino/fibroblasts_sorted_R1_001.fastq.gz /n/data1/hms/dbmi/park/DATA/BSMCommonExperiment/ReferenceTissueProject/BamToFastq/Vaccarino/fibroblasts_sorted_R2_001.fastq.gz

cd /n/data1/hms/dbmi/park/alon/software/bcbio/project1/work
bcbio_nextgen.py /n/data1/hms/dbmi/park/alon/software/bcbio/project1/config/project1.yaml -t ipython -n 64 -s lsf -q park_long

for reference, my project1-template.yaml reads as follows:

# Template for paired (tumor/normal) variant calling
---
details:
  - analysis: variant2
    genome_build: hs37d5
    # In order to do paired variant calling, samples should belong to the
    # same batch ("batch" under "metadata" below") and have a "phenotype"
    # field stating either "normal" or "tumor". For each batch there
    # should be a sample with "tumor" phenotype and a sample with "normal"
    # phenotype (no more than two samples per batch)
    metadata:
       batch: your-batch-name
       phenotype: tumor # or "normal"
    algorithm:
      aligner: bwa
      mark_duplicates: true
      recalibrate: false
      realign: false
      variantcaller: [vardict, mutect2, freebayes]
      indelcaller: false
      ensemble:
        numpass: 2
      # for targetted projects, set the region
      # variant_regions: /path/to/your.bed

Finally, I should mention I would optimally like to run the following project1-template.yaml, and have experienced similar errors with this setup:

# Template for paired (tumor/normal) variant calling
---
details:
  - analysis: variant2
    genome_build: hs37d5
    # In order to do paired variant calling, samples should belong to the
    # same batch ("batch" under "metadata" below") and have a "phenotype"
    # field stating either "normal" or "tumor". For each batch there
    # should be a sample with "tumor" phenotype and a sample with "normal"
    # phenotype (no more than two samples per batch)
    metadata:
       batch: your-batch-name
       phenotype: tumor # or "normal"
    algorithm:
      aligner: bwa
      mark_duplicates: true
      recalibrate: true
      realign: true
      variantcaller:
         somatic: [vardict, varscan, mutect2]
         germline: gatk-haplotype
      indelcaller: false
      # for targetted projects, set the region
      # variant_regions: /path/to/your.bed

Thanks a lot for your help!

Alon

chapmanb commented 6 years ago

Alon; Sorry about the issues and thanks for the detailed report. It looks like the tool grabix that we use for preparing indexes of fastq files for alignment splitting is hanging when trying to process your input gzipped fastq files. If this is a recent bcbio install I'm not sure why it would do this; we had a problem a while back with some Illumina outputs directly into grabix but I believe had resolved all those issues.

We could try to debug what is going on with the inputs with more details about how they were prepared to see if I could replicate the issue.

Another approach to work around it would be to update to the latest development version (bcbio_nextgen.py upgrade -u development) and add align_split_size: false to the configuration next to aligner. That will skip this indexing all together and just align the fastqs in one go, which should avoid the issue.

Hope this helps get things running for you.

alongalor commented 6 years ago

Thanks a lot for your very helpful response, Brad! Surprisingly after letting this run, it made it to the next step after a day or so of runtime without any changes and is currently chugging along!

Much appreciated,

Alon