bcbio / bcbio-nextgen-vm

Run bcbio-nextgen genomic sequencing analyses using isolated containers and virtual machines
MIT License
65 stars 17 forks source link

bcbio-vm stucks at a step during run without any errors? #145

Closed mortunco closed 8 years ago

mortunco commented 8 years ago

Hi again,

I would like to include indel calling to the process, there fore I added pinned to the timor-normal paired cancer tutorial. However, it freezes during execution with out error without any warnings or etc. Also, when i check the processes with $sacct both clusters seems running. Moreover, same output stays without any addition/progress in the slurm-out.txt

1)Because it is the second time that i got this error, I think it might be related with configuration. Is there a problem in my configuration?

2)Also, since the problem in the alignmet, do you think bwa might have gone in a void ?

I would be more than glad if you could help me to solve this problem. As before, I am ready to do everything you offer.

Thank you for your help,

Best regards

Tunc.

My cluster is consisted of 1 fronted c3.large and 2 compute c3.8xlarge instances. `` Available clusters: bcbio

Configuration for cluster 'bcbio': Frontend: c3.large with 500Gb NFS storage Cluster: 2 c3.8xlarge machines

AWS setup: OK: expected IAM user 'bcbio' exists. OK: expected security group 'bcbio_cluster_sg' exists. OK: VPC 'bcbio' exists.

Instances in VPC 'bcbio': bcbio-frontend001 (c3.large, running) at 52.87.249.248 in us-east-1a bcbio-compute001 (c3.8xlarge, running) at 52.91.95.81 in us-east-1a bcbio-compute002 (c3.8xlarge, running) at 54.208.56.186 in us-east-1a ``

This is the tail of the slurm output. As you might guess it is a long output but i wanted to share you the point where it does not go further anymore.

...

[2016-04-05T10:51Z] compute001: INFO  10:51:05,944 HelpFormatter - --------------------------------------------------------------------------------
[2016-04-05T10:51Z] compute001: INFO  10:51:05,946 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
[2016-04-05T10:51Z] compute001: INFO  10:51:05,946 HelpFormatter - Copyright (c) 2010 The Broad Institute
[2016-04-05T10:51Z] compute001: INFO  10:51:05,946 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
[2016-04-05T10:51Z] compute001: INFO  10:51:05,949 HelpFormatter - Program Args: -T IndelRealigner -I /encrypted/project2/work/bamprep/syn3-tumor/19/syn3-tumor-sort-19_31038548_59128983-prep-prealign.bam -R /encrypted/project2/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -targetIntervals /encrypted/project2/work/bamprep/syn3-tumor/19/syn3-tumor-sort-19_31038548_59128983-prep-prealign-realign.intervals -L 19:31038549-59128983 --knownAlleles /encrypted/project2/work/inputs/data/genomes/GRCh37/variation/Mills_and_1000G_gold_standard.indels.vcf.gz -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -o /encrypted/project2/work/bamprep/syn3-tumor/19/tx/tmpxFMzSL/syn3-tumor-sort-19_31038548_59128983-prep.bam
[2016-04-05T10:51Z] compute001: INFO  10:51:05,955 HelpFormatter - Executing as ubuntu@compute001 on Linux 3.13.0-83-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_95-b00.
[2016-04-05T10:51Z] compute001: INFO  10:51:05,955 HelpFormatter - Date/Time: 2016/04/05 10:51:05
[2016-04-05T10:51Z] compute001: INFO  10:51:05,955 HelpFormatter - --------------------------------------------------------------------------------
[2016-04-05T10:51Z] compute001: INFO  10:51:05,956 HelpFormatter - --------------------------------------------------------------------------------
[2016-04-05T10:51Z] compute001: INFO  10:51:06,082 GenomeAnalysisEngine - Strictness is SILENT
[2016-04-05T10:51Z] compute001: INFO  10:51:06,152 GenomeAnalysisEngine - Downsampling Settings: No downsampling
[2016-04-05T10:51Z] compute001: INFO  10:51:06,159 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
[2016-04-05T10:51Z] compute001: INFO  10:51:06,194 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.03
[2016-04-05T10:51Z] compute001: INFO  10:51:06,432 IntervalUtils - Processing 28090435 bp from intervals
[2016-04-05T10:51Z] compute001: WARN  10:51:06,436 IndexDictionaryUtils - Track knownAlleles doesn't have a sequence dictionary built in, skipping dictionary validation
[2016-04-05T10:51Z] compute001: INFO  10:51:06,495 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
[2016-04-05T10:51Z] compute001: INFO  10:51:06,631 GenomeAnalysisEngine - Done preparing for traversal
[2016-04-05T10:51Z] compute001: INFO  10:51:06,631 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
[2016-04-05T10:51Z] compute001: INFO  10:51:06,632 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
[2016-04-05T10:51Z] compute001: INFO  10:51:06,632 ProgressMeter -        Location |     reads | elapsed |     reads | completed | runtime |   runtime
[2016-04-05T10:51Z] compute001: INFO  10:51:06,757 ReadShardBalancer$1 - Loading BAM index data
[2016-04-05T10:51Z] compute001: INFO  10:51:06,934 ReadShardBalancer$1 - Done loading BAM index data
[2016-04-05T10:51Z] compute002: INFO  10:51:23,579 ProgressMeter -      19:4037933    500006.0    30.0 s      60.0 s       13.0%     3.8 m       3.3 m
[2016-04-05T10:51Z] compute001: INFO  10:51:37,761 ProgressMeter -     19:39960605    600009.0    31.0 s      51.0 s       31.8%    97.0 s      66.0 s
[2016-04-05T10:51Z] compute002: INFO  10:51:53,697 ProgressMeter -     19:10203309   1300019.0    60.0 s      46.0 s       32.9%     3.0 m       2.0 m
[2016-04-05T10:52Z] compute001: INFO  10:52:08,467 ProgressMeter -     19:46626949   1500026.0    61.0 s      41.0 s       55.5%   109.0 s      48.0 s
[2016-04-05T10:52Z] compute002: INFO  10:52:23,940 ProgressMeter -     19:16540061   2100029.0    90.0 s      43.0 s       53.3%     2.8 m      78.0 s
[2016-04-05T10:52Z] compute001: INFO  10:52:38,842 ProgressMeter -     19:53116830   2300034.0    92.0 s      40.0 s       78.6%   117.0 s      25.0 s
[2016-04-05T10:52Z] compute002: INFO  10:52:45,098 ProgressMeter -            done   2749851.0   111.0 s      40.0 s      100.0%   111.0 s       0.0 s
[2016-04-05T10:52Z] compute002: INFO  10:52:45,099 ProgressMeter - Total runtime 111.52 secs, 1.86 min, 0.03 hours
[2016-04-05T10:52Z] compute002: INFO  10:52:45,102 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 2749851 total reads (0.00%)
[2016-04-05T10:52Z] compute002: INFO  10:52:45,102 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
[2016-04-05T10:52Z] compute002: INFO  10:52:45,102 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
[2016-04-05T10:52Z] compute002: INFO  10:52:45,103 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
[2016-04-05T10:52Z] compute002: INFO  10:52:45,899 GATKRunReport - Uploaded run statistics report to AWS S3
[2016-04-05T10:52Z] compute002: Index BAM file: syn3-tumor-sort-19_0_31026185-prep.bam
[2016-04-05T10:53Z] compute001: INFO  10:53:03,602 ProgressMeter -            done   3026719.0   116.0 s      38.0 s       99.9%   116.0 s       0.0 s
[2016-04-05T10:53Z] compute001: INFO  10:53:03,602 ProgressMeter - Total runtime 116.97 secs, 1.95 min, 0.03 hours
[2016-04-05T10:53Z] compute001: INFO  10:53:03,605 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 3026719 total reads (0.00%)
[2016-04-05T10:53Z] compute001: INFO  10:53:03,605 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
[2016-04-05T10:53Z] compute001: INFO  10:53:03,606 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
[2016-04-05T10:53Z] compute001: INFO  10:53:03,606 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
[2016-04-05T10:53Z] compute001: INFO  10:53:04,398 GATKRunReport - Uploaded run statistics report to AWS S3
[2016-04-05T10:53Z] compute001: Index BAM file: syn3-tumor-sort-19_31038548_59128983-prep.bam

This is the final form of the configurtion file which are created while initiating a run in frowned node work/config/ directory.

details:
- algorithm:
    align_split_size: 5000000
    aligner: bwa
    ensemble:
      numpass: 2
    indelcaller: pindel
    mark_duplicates: true
    nomap_split_targets: 100
    platform: illumina
    quality_format: standard
    realign: true
    recalibrate: true
    remove_lcr: true
    variant_regions: s3://tuncproject/bcbiovmrun/input/NGv3.bed
    variantcaller:
    - mutect
    - freebayes
    - vardict
    - varscan
    - mutect2
  analysis: variant2
  description: syn3-normal
  files:
  - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_normal_NGv3_1.fq.gz
  - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_normal_NGv3_2.fq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: normal
- algorithm:
    align_split_size: 5000000
    aligner: bwa
    ensemble:
      numpass: 2
    indelcaller: pindel
    mark_duplicates: true
    nomap_split_targets: 100
    platform: illumina
    quality_format: standard
    realign: true
    recalibrate: true
    remove_lcr: true
    validate_regions: s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_20pctmasked_truth_regions.bed
    variant_regions: s3://tuncproject/bcbiovmrun/input/NGv3.bed
    variantcaller:
    - mutect
    - freebayes
    - vardict
    - varscan
    - mutect2
  analysis: variant2
  description: syn3-tumor
  files:
  - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_NGv3_1.fq.gz
  - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_NGv3_2.fq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: tumor
fc_date: '2014-08-13'
fc_name: dream-syn3
resources:
  gatk:
    jar: s3://tuncproject/gatktools/GenomeAnalysisTK.jar
  mutect:
    jar: s3://tuncproject/gatktools/mutect-1.1.7.jar
upload:
  bucket: tuncproject
  dir: ../final
  folder: bcbiovmrun/input/final
  method: s3
  region: us-east-1

outpuf of sacct_std

ubuntu@frontend001:/encrypted/project2/work/config$ sacct_std
       JobID    JobName  Partition   NNodes  AllocCPUS      State        NodeList 
------------ ---------- ---------- -------- ---------- ---------- --------------- 
10           bcbio_sub+      cloud        1          1    RUNNING      compute001 
14              bcbio-c      cloud        1          1    RUNNING      compute001 
15           bcbio-e[1+      cloud        1         27    RUNNING      compute001 
16           bcbio-e[1+      cloud        1         27    RUNNING      compute002 
17           bcbio-e[1+      cloud        2          0    PENDING   None assigned 
chapmanb commented 8 years ago

Tunc; Sorry about the problems with the run. It looks like the analysis is currently in the indel realignment step, although I'm not able to identify a root cause of the problem from what you posted. If you look in log/bcbio-nextgen.log are there any error messages? Sometimes these are earlier in the file if you're having failures and you might not see them in the latest debug output. If you ssh into the worker nodes (compute001 and compute002) and check top, are there any processes running?

Practically, we don't find much value in doing indel realignment and BQSR:

http://bcb.io/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/

so if you're having issues here it might be worth setting realign: false and recalibrate: false to avoid the problems.

Hope one of these ideas helps get the analysis moving.

mortunco commented 8 years ago

Brad;

Thank you for the fast response. Also, great article about the comparison of the callers. The only difference with the current configuration file with the timor-normal paired one is the indelcaller: pindel. Could you tell me how to connect compute notes ? because bcbiovm_py aws cluster ssh connect to frontend node?

So for the big picture, i will definitely set realignment and recalibration false. However, the difference of the current configuration file and the initial one(the untouch version of the tumor-normal paired variant calling) was only the indel option. My question is, wouldn't realignment and recalibration cause the same error in the initial run? Because with the initial run it performed well and did not give any errors. For including indels should I make these option false ??

Best regards,

Tunc.

chapmanb commented 8 years ago

Tunc; My guess it that you had a one-off java error or some other problem in this latest run, causing the failure here. The change in your configuration shouldn't have triggered anything new. If you can identify any errors happy to debug more.

You should be able to ssh around the cluster once you're on the frontend node to check if things are processing, with ssh compute002.

Hope this helps.

mortunco commented 8 years ago

Brad;

I literally checked all the lines. I apologise to share all the possible outputs with you but this was my last option. It has been the second time that system fails at the post alignment processes. If you look at the debug one especially, there are running docker container repeats which I have never seen before. Should I try other indelcaller ? Should I try initial tumor-normal run again? I am ready to do anything to solve this problem.

Thank you for your patience and help, Best regards,

Tunc.

bcbio-nextgen-debug.log.txt bcbio-nextgen.log.txt slurm-2.out.txt

This is the ls of the final folder. There are several ipengine.err files which look suspicious ? Are they supposed to be there ?

[ec2-user@ip-172-31-59-88 ~]$ aws s3 ls s3://tuncproject/bcbiovmrun/deneme3_output/
                           PRE align/
                           PRE bedprep/
                           PRE checkpoints_parallel/
                           PRE config/
                           PRE log/
                           PRE provenance/
                           PRE regions/
2016-04-06 21:49:36       1035 SLURM_controller081daafc-0fe2-4391-b585-7c6365ee6333
2016-04-06 21:49:36       1035 SLURM_controller7992f762-49dd-432b-9ee0-368f042da6ed
2016-04-06 21:49:36       1011 SLURM_engine6ee7d71f-fbce-4329-8802-887819a0d314
2016-04-06 21:49:36      21603 SLURM_engine82b1924c-472d-41b8-8f12-a95cbb1e4f80
2016-04-06 21:49:36        251 bcbio-ipcontroller.err.3
2016-04-06 21:49:36        184 bcbio-ipcontroller.err.6
2016-04-06 21:49:36          0 bcbio-ipcontroller.out.3
2016-04-06 21:49:36          0 bcbio-ipcontroller.out.6
2016-04-06 21:49:36        826 bcbio-ipengine.err.%4
2016-04-06 21:49:36       1033 bcbio-ipengine.err.%5
2016-04-06 21:49:36      19274 bcbio-ipengine.err.%7
2016-04-06 21:49:36      18075 bcbio-ipengine.err.%8
2016-04-06 21:49:36          0 bcbio-ipengine.out.%4
2016-04-06 21:49:36          0 bcbio-ipengine.out.%5
2016-04-06 21:49:36          0 bcbio-ipengine.out.%7
2016-04-06 21:49:36          0 bcbio-ipengine.out.%8
2016-04-06 21:49:36        209 bcbio_submit.sh
2016-04-06 21:49:36        970 bcbio_system-prep.yaml
2016-04-06 21:49:36       1905 deneme3-ready.yaml
2016-04-06 21:49:36      14559 runfn-piped_bamprep-f78d3296-8952-4971-954b-67514744c6f4-out.yaml
2016-04-06 21:49:36      14646 runfn-piped_bamprep-f78d3296-8952-4971-954b-67514744c6f4.yaml
2016-04-06 21:49:36    1594407 slurm-2.out
chapmanb commented 8 years ago

Tunc; Sorry about the continued problems. I'm not sure why it's getting locked up here and you're right there is not anything to go on in these log files. The docker runs are from checking files that have previously been processed. So it starts up docker, checks the files, and shuts it down. This is one of the current inefficiencies in the Docker-based approach we're looking to improve on right now.

Can you set realign: false and recalibrate: false, or do you feel that you need these steps? Doing that would skip this processing and hopefully get you to a better place in the analysis. Hope this helps.

mortunco commented 8 years ago

Dear Brad;

Referring to your previous messages, i did try that option. Like I said, I am convinced with your article and tried that approach also. Do you think is it related with the locations of the GATK and Mutect? Also, I use mutect and mutect2 at the same time. Do you think this cause any problem ?

Actually for the last run I used this setting.

details:
- algorithm:
    aligner: bwa
    align_split_size: 5000000
    nomap_split_targets: 100
    mark_duplicates: true
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    quality_format: standard
    variantcaller: [mutect, freebayes, vardict, varscan, mutect2]
    indelcaller: pindel
    ensemble: 
      numpass: 2
    variant_regions: s3://tuncproject/bcbiovmrun/input/NGv3.bed
    # svcaller: [cnvkit, lumpy, delly]
    # coverage_interval: amplicon
  analysis: variant2
  description: syn3-normal
  #files: ../input/synthetic.challenge.set3.normal.bam
  files:
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_normal_NGv3_1.fq.gz
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_normal_NGv3_2.fq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: normal
- algorithm:
    aligner: bwa
    align_split_size: 5000000
    nomap_split_targets: 100
    mark_duplicates: true
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    quality_format: standard
    variantcaller: [mutect, freebayes, vardict, varscan, mutect2]
    indelcaller: pindel
    ensemble:
      numpass: 2
    variant_regions: s3://tuncproject/bcbiovmrun/input/NGv3.bed
    validate_regions: s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_20pctmasked_truth.vcf.gz
    validate_regions: s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_20pctmasked_truth_regions.bed
    # svcaller: [cnvkit, lumpy, delly]
    # coverage_interval: amplicon
  #   svvalidate:
  #     DEL: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_DEL.bed
  #     DUP: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_DUP.bed
  #     INS: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_INS.bed
  #     INV: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_INV.bed
  analysis: variant2
  description: syn3-tumor
  #files: ../input/synthetic.challenge.set3.tumor.bam
  files:
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_NGv3_1.fq.gz
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_NGv3_2.fq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: tumor
fc_date: '2014-08-13'
fc_name: dream-syn3
resources:
  gatk:
    jar: s3://tuncproject/gatktools/GenomeAnalysisTK.jar 
  mutect: 
    jar: s3://tuncproject/gatktools/mutect-1.1.7.jar
upload:
  dir: s3://tuncproject/bcbiovmrun/final/
chapmanb commented 8 years ago

Tunc; Sorry about the problems even with realign: false set. I'm confused as to what is going on, as it should be skipping these steps entirely if you have that set of false. My suggestion at this point would be to run in single core mode (bcbio_vm.py run your_config.yaml) to see if that provides any additional information to help with debugging. Sorry to not have better ideas but hope this helps.

mortunco commented 8 years ago

Dear Brad;

I have come across something interesting during aws configuration. I was trying to run on a single machine therefore, I configured my config as I share with you below. But even though I set machine number to 0 it created two compute instances. I may be freaked out for every single inconstancies, do you think this configuration problem is related with the error ? Should I share my elastic cluster configuration file with you ?

EDIT: When I check my e3 instances, i see 3 instances created on my ec2 console. EDIT2: I initited my process with bcbio_vm.py run myconf.yaml -n 32 . ( n = 32 because created 2 c3.8xlarge instances). I checked both of the compute no intances tops and their cpus are occupied 98%. I am looking forward to hear what might cause this interesting thing.

I was expecting to run basically a single machine but this run command initiated a run like it did in ipythonprep. Could you illuminate me in this subject?

[ec2-user@ip-172-31-59-88 ~]$ bcbio_vm.py aws config edit
Changing configuration for cluster bcbio

Size of encrypted NFS mounted filesystem, in Gb [500]: 500
Number of cluster worker nodes (0 starts a single machine instead of a cluster) [2]: 0
Machine type for single frontend worker node [c3.large]: c3.8xlarge

Updated configuration for cluster bcbio
Run 'bcbio_vm.py aws info' to see full details for the cluster
[ec2-user@ip-172-31-59-88 ~]$ bcbio_vm.py aws info
Available clusters: bcbio

Configuration for cluster 'bcbio':
 Frontend: c3.8xlarge with 500Gb NFS storage

AWS setup:
 OK: expected IAM user 'bcbio' exists.
 OK: expected security group 'bcbio_cluster_sg' exists.
 OK: VPC 'bcbio' exists.

Instances in VPC 'bcbio':
[ec2-user@ip-172-31-59-88 ~]$ bcbio_vm.py aws cluster start
Starting cluster `bcbio` with 1 frontend nodes.
Starting cluster `bcbio` with 2 compute nodes.
(this may take a while...)
INFO:gc3.elasticluster:Starting node compute001.
INFO:gc3.elasticluster:Starting node compute002.
INFO:gc3.elasticluster:Starting node frontend001.
INFO:gc3.elasticluster:_start_node: node has been started
INFO:gc3.elasticluster:_start_node: node has been started
INFO:gc3.elasticluster:_start_node: node has been started
chapmanb commented 8 years ago

Tunc; Sorry about this, I'm not sure why you're getting inconsistent results from the configuration logic. I know you had other problems with this earlier and wonder if there is something strange out your configuration file in ~/.bcbio/elasticluster/config. It should have a single cluster/bcbio section with frontend_nodes and compute_nodes set from the edit command:

https://github.com/chapmanb/bcbio-nextgen-vm/blob/master/elasticluster/config#L45

Do you see other things in there that might explain it? Happy to look at the file if it helps but if you post please don't include your ec2_access_key and ec2_secret_key variables. Sorry for not having a clear idea but hope this helps.

mortunco commented 8 years ago

Dear Brad;

I guess the problem was caused because of the multiple config files. I removed all the configuration files called config.bak#somedate and left the config one and it produced this. I will keep you updated with the result of the run. Thanks!

Best, Tunc.

[ec2-user@ip-172-31-59-88 ~]$ bcbio_vm.py aws cluster start
Starting cluster `bcbio` with 1 frontend nodes.
(this may take a while...)
INFO:gc3.elasticluster:Starting node frontend001.
INFO:gc3.elasticluster:_start_node: node has been started
mortunco commented 8 years ago

Dear Brad;

I am glad we at least solved the problem about going in to an infinite loop. Luckily, I believe my current problems are usual problems which everyone can interfere with. Thank you for the single core.

Before I talk about my errors in the process I have three basic questions:

  1. What is the order of variation calling methods run? Does bcbio follow the order of the input in the variant caller: ?
  2. Can I use Mutect and Mutect 2 in the same process?
  3. Can I enter multiple indel callers in a single run ? pindel and scalpel at the sametime. Will this interfere with Mutect 2 indels?

So for my errors during the process. I couldn't have a successful run yet with pindel option on. Process got interrupted in mutect variant calling somehow, Mutect called mutations until chr 9 then it got interrupted with this error;

The interesting thing is why mutect got interrupted by pindel in the middle of its process. ? I asked a question about the order of the variant callers because, mutect2 finalized before mutect while it is at the top of the list.

I am waiting for new suggestions to solve this problem.

Thank you for helping me out,

Best,

Tunc

This is the configuration file at /encrypted/project5/work/config

    variantcaller:
    - mutect
    - freebayes
    - vardict
    - varscan
    - mutect2

This is the error which concludes the pipeline break.

...
Insertsize in config: 250
The number of one end mapped read: 3154
Number of problematic reads in current window:              8421, + 6159 - 2262
Number of split-reads where the close end could be mapped:  3154, + 2328 - 826
/bin/bash: line 1: 55470 Killed                  /usr/local/share/bcbio-nextgen/anaconda/bin/pindel -f /mnt/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -i /mnt/work/tx/tmpw4nKO8/pindel.txt -o /mnt/work/tx/tmpw4nKO8/pindelroot -j /mnt/work/mutect/1/syn3-1_0_31187470-somaticIndels-regions-nolcr-nolcr.bed --max_range_index 2 --IndelCorrection --report_breakpoints false --report_interchromosomal_events false
' returned non-zero exit status 137
___________________________________________________________________________
' returned non-zero exit status 1

This is how I understood. I also checked the mutect folder it has called chromosomes 1 to 9. The output below is from debug log file.

[2016-04-10T22:57Z] /usr/local/share/bcbio-nextgen/anaconda/bin/tabix -f -p vcf /mnt/work/mutect/5/tx/tmpEHLgVU/syn3-5_156785743_180915260-mutect.vcf.gz
[2016-04-10T22:57Z] java -Xms454m -Xmx1590m -XX:+UseSerialGC -Djava.io.tmpdir=/mnt/work/tx/tmphqHOAQ -jar /mnt/work/inputs/jars/mutect/mutect-1.1.7.jar -R /mnt/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -T MuTect -U ALLOW_N_CIGAR_READS --read_filter NotPrimaryAlignment -I:tumor /mnt/work/align/syn3-tumor/syn3-tumor-sort.bam --tumor_sample_name syn3-tumor -I:normal /mnt/work/align/syn3-normal/syn3-normal-sort.bam --normal_sample_name syn3-normal --dbsnp /mnt/work/inputs/data/genomes/GRCh37/variation/dbsnp_138.vcf.gz --cosmic /mnt/work/inputs/data/genomes/GRCh37/variation/cosmic-v68-GRCh37.vcf.gz -L /mnt/work/mutect/8/syn3-8_62288979_93648031-mutect-regions.bed --interval_set_rule INTERSECTION --enable_qscore_output --vcf /mnt/work/mutect/8/tx/tmpTD9SKw/syn3-8_62288979_93648031-mutect-orig.vcf.gz -o /dev/null
[2016-04-10T22:57Z] /usr/local/share/bcbio-nextgen/anaconda/bin/pindel -f /mnt/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -i /mnt/work/tx/tmpxh1YeV/pindel.txt -o /mnt/work/tx/tmpxh1YeV/pindelroot -j /mnt/work/mutect/5/syn3-5_156785743_180915260-somaticIndels-regions-nolcr-nolcr.bed --max_range_index 2 --IndelCorrection --report_breakpoints false --report_interchromosomal_events false
[2016-04-10T22:57Z] cat /mnt/work/mutect/7/syn3-7_157155595_159138663-mutect.vcf  | /usr/local/share/bcbio-nextgen/anaconda/bin/bgzip -c > /mnt/work/mutect/7/tx/tmpaISLCm/syn3-7_157155595_159138663-mutect.vcf.gz
[2016-04-10T22:57Z] /usr/local/share/bcbio-nextgen/anaconda/bin/tabix -f -p vcf /mnt/work/mutect/7/tx/tmpfVdUqL/syn3-7_157155595_159138663-mutect.vcf.gz
[2016-04-10T22:58Z] java -Xms454m -Xmx1590m -XX:+UseSerialGC -Djava.io.tmpdir=/mnt/work/tx/tmpduwtTd -jar /mnt/work/inputs/jars/mutect/mutect-1.1.7.jar -R /mnt/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -T MuTect -U ALLOW_N_CIGAR_READS --read_filter NotPrimaryAlignment -I:tumor /mnt/work/align/syn3-tumor/syn3-tumor-sort.bam --tumor_sample_name syn3-tumor -I:normal /mnt/work/align/syn3-normal/syn3-normal-sort.bam --normal_sample_name syn3-normal --dbsnp /mnt/work/inputs/data/genomes/GRCh37/variation/dbsnp_138.vcf.gz --cosmic /mnt/work/inputs/data/genomes/GRCh37/variation/cosmic-v68-GRCh37.vcf.gz -L /mnt/work/mutect/8/syn3-8_93896832_124968568-mutect-regions.bed --interval_set_rule INTERSECTION --enable_qscore_output --vcf /mnt/work/mutect/8/tx/tmpL7h8d2/syn3-8_93896832_124968568-mutect-orig.vcf.gz -o /dev/null
[2016-04-10T22:58Z] /usr/local/share/bcbio-nextgen/anaconda/bin/pindel -f /mnt/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -i /mnt/work/tx/tmpA0L1Gl/pindel.txt -o /mnt/work/tx/tmpA0L1Gl/pindelroot -j /mnt/work/mutect/7/syn3-7_157155595_159138663-somaticIndels-regions-nolcr-nolcr.bed --max_range_index 2 --IndelCorrection --report_breakpoints false --report_interchromosomal_events false
[2016-04-10T22:58Z] cat /mnt/work/mutect/6/syn3-6_0_31080572-mutect.vcf  | /usr/local/share/bcbio-nextgen/anaconda/bin/bgzip -c > /mnt/work/mutect/6/tx/tmpYnlNEu/syn3-6_0_31080572-mutect.vcf.gz
[2016-04-10T22:58Z] /usr/local/share/bcbio-nextgen/anaconda/bin/tabix -f -p vcf /mnt/work/mutect/6/tx/tmpCPQZF5/syn3-6_0_31080572-mutect.vcf.gz
[2016-04-10T22:58Z] /usr/local/share/bcbio-nextgen/anaconda/bin/pindel -f /mnt/work/inputs/data/genomes/GRCh37/seq/GRCh37.fa -i /mnt/work/tx/tmp2hDfqb/pindel.txt -o /mnt/work/tx/tmp2hDfqb/pindelroot -j /mnt/work/mutect/6/syn3-6_0_31080572-somaticIndels-regions-nolcr-nolcr.bed --max_range_index 2 --IndelCorrection --report_breakpoints false --report_interchromosomal_events false
chapmanb commented 8 years ago

Tunc; Sorry about the problems. To start with answering your questions:

  1. The variant callers are run roughly in the order specified. They are all run in the same set of processes so will overlap depending on the speed of different callers.
  2. Yes, MuTect and MuTect2 can run together. There is no overlap between these and they don't conflict in any way.
  3. No, you can only run a single indelcaller to supplement MuTect and could use either scalpel or pindel. We found both methods perform worse than other callers like VarDict so haven't put a lot of development effort into extending this.

The error you saw is the operating system stopping the pindel process due to memory, indicated by the Killed message. We don't have much practical experience running pindel, but there are two workarounds I can see:

resources:
   mutect:
      jvm_opts: ["-Xms500m", "-Xmx4000m"]

and increasing the -Xmx parameter if that does provide enough.

Hope this helps.

mortunco commented 8 years ago

Brad;

Thank you for rapid response. I will definitely give a try to scalpel. Forgive my ignorance but I have concerns if I manage the hardware allocation by my self, I think I might mess up the bcbio process. Could you help me out with that?

My aim is to do variant calling with 5 algorithms + calling indels.

Also this mutect memory amount is per core ? or total ?

chapmanb commented 8 years ago

Tunc; Memory specifications are always per core:

http://bcbio-nextgen.readthedocs.org/en/latest/contents/parallel.html#tuning-core-and-memory-usage

You can add the resource specification I suggested above at the top level of your sample YAML:

http://bcbio-nextgen.readthedocs.org/en/latest/contents/configuration.html#sample-or-run-specific-resources

Hope this helps.

mortunco commented 8 years ago

Thank you very much for your help and patience. I will try to work it out.

Best,

Tunc.