samblaster: Unable to write to output file. [fputs] Broken pipe

erevkov commented 7 years ago

Hi,

I've been running the analysis with bcbio installed on EBS volume connected to a single c4.2xlarge AWS instance, here's the config file:

details:
-  algorithm:
     platform: illumina
     quality_format: standard
     aligner: bwa
     save_diskspace: true
     tools_off: fastqc
     bam_clean: false
     bam_sort: false
     mark_duplicates: true
     recalibrate: false
     realign: gatk
     variant_regions: /mnt/work/test_1/data/SeqCap_target_1000genome.bed
     remove_lcr: false
     variantcaller: [mutect2,freebayes,vardict,varscan]
     svcaller: [lumpy,manta,cnvkit]
     indelcaller: false
     ensemble:
       numpass: 2
   analysis: variant2
   description: GIS-A144-T-S32-E-N
   files: /mnt/work/test_1/data/HS004-PE-R00121_BC6KTUANXX.human_g1k_v37.CHL506_reAligned_reCal.bam
   genome_build: GRCh37
   metadata:
     batch: GIS-A144-T-S32-E
     phenotype: normal
-  algorithm:
     platform: illumina
     quality_format: standard
     aligner: bwa
     save_diskspace: true
     tools_off: fastqc
     bam_clean: false
     bam_sort: false
     mark_duplicates: true
     recalibrate: false
     realign: gatk
     variant_regions: /mnt/work/test_1/data/SeqCap_target_1000genome.bed
     remove_lcr: false
     variantcaller: [mutect2,freebayes,vardict,varscan]
     svcaller: [lumpy,manta,cnvkit]
     indelcaller: false
     ensemble:
       numpass: 2
   analysis: variant2
   description: GIS-A144-T-S32-E-T
   files: /mnt/work/test_1/data/HS004-PE-R00121_BC6KTUANXX.human_g1k_v37.CHL481_reAligned_reCal.bam
   genome_build: GRCh37
   metadata:
     batch: GIS-A144-T-S32-E
     phenotype: tumor
fc_date: '2015-07-31'
fc_name: GIS-A144-T-S32-E
upload:
 dir: /mnt/work/test_2/bcbio_final

and I keep getting the error, which looks like this in bcbio-nextgen-debug.log:

[2017-08-04T08:12Z] [M::mem_pestat] analyzing insert size distribution for orientation FR...
[2017-08-04T08:12Z] [M::mem_pestat] (25, 50, 75) percentile: (159, 199, 251)
[2017-08-04T08:12Z] [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 435)
[2017-08-04T08:12Z] [M::mem_pestat] mean and std.dev: (208.86, 66.35)
[2017-08-04T08:12Z] [M::mem_pestat] low and high boundaries for proper pairs: (1, 527)
[2017-08-04T08:12Z] [M::mem_pestat] skip orientation RF as there are not enough pairs
[2017-08-04T08:12Z] [M::mem_pestat] skip orientation RR as there are not enough pairs
[2017-08-04T08:12Z] samblaster: Unable to write to output file.
[2017-08-04T08:12Z] [fputs] Broken pipe
[2017-08-04T08:12Z] /bin/bash: line 1: 12125 Exit 1                  /mnt/work/bcbio/galaxy/../anaconda/bin/bwa mem $
[2017-08-04T08:12Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/mnt/work/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/mnt/work/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 102, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; unset JAVA_HOME && /mnt/work/bcbio/galaxy/../anaconda/bin/bwa mem   -c$
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (98, 8237, 8281)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 24647)
[M::mem_pestat] mean and std.dev: (5304.03, 3837.17)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 32830)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation FF as there are not enough pairs

...(several more [M::mem_pestat] entries)...

[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (159, 199, 251)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 435)
[M::mem_pestat] mean and std.dev: (208.86, 66.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 527)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
samblaster: Unable to write to output file.
[fputs] Broken pipe
/bin/bash: line 1: 12125 Exit 1                  /mnt/work/bcbio/galaxy/../anaconda/bin/bwa mem -c 250 -M -t 7 -R '@$
     12126                       | /mnt/work/bcbio/galaxy/../anaconda/bin/samblaster --addMateTags -M --splitterFile$
     12127 Killed                  | /mnt/work/bcbio/galaxy/../anaconda/bin/samtools sort -n -@ 7 -m 1G -T /mnt/work$
' returned non-zero exit status 137

Here's my bcbio_system.yaml (since the instance has 15GiB and 8 vCPUs):

galaxy_config: universe_wsgi.ini
resources:
  default:
    cores: 7
    jvm_opts:
    - -Xms750m
    - -Xmx3500m
    memory: 2G
  dexseq:
    memory: 10g
  express:
    memory: 8g
  gatk:
    dir: /mnt/work/bcbio/toolplus/gatk/3.6-0-g89b7209
    jvm_opts:
    - -Xms500m
    - -Xmx3500m
  macs2:
    memory: 8g
  qualimap:
    memory: 4g
  seqcluster:
    memory: 8g
  snpeff:
    jvm_opts:
    - -Xms750m
    - -Xmx3g

and I run this command to start it

bcbio_nextgen.py ../config/test_2.yaml -n 7

Also, if I change the config to mark_duplicates: false and delete lumpy from svcaller (in hopes to avoid using samblaster) the analysis proceeds and finishes just fine. I've looked at similar issues but nothing really seem to help in my case. I thought that there may be something wrong with the memory setup, but I can't really figure what exactly.

Could you please provide some advice?

Thanks in advance, Egor

chapmanb commented 7 years ago

Egor; Thanks for the detailed report and sorry about the problem. The 'Killed' message in the output indicates the operating system is stopping one or more of the processes in the pipe, likely due to memory issues. The memory on c4 instances is pretty tight at 2Gb/core and memory usage on both bwa and sambamba is not bounded so we can't set how much they'll use.

If you need to use this instance type you can either:

unset lumpy, in which case bcbio will use bamsormadup which does have memory bounds.
Increase the memory/core allocation in your bcbio_system.yaml file so bcbio uses less cores per process (and thus less memory for sorting).

If you want lumpy output, then I'd suggest moving to a m4 instance type which has more memory/core and should work better.

Hope this helps

erevkov commented 7 years ago

Brad;

Thanks a lot, the pipeline seems to be working fine now!

-Egor

erevkov commented 7 years ago

I'm sorry to reopen the issue, but I think my question is sort of related to the previous ones and I couldn't find a solution anywhere else. Could you suggest an automated way to profile memory/cpu usage for the bcbio pipeline running on a single AWS instance (to compare the performances between different instances / cluster setup)? I've been thinking of using Amazon's CloudWatch, but maybe you can suggest something better.

Thanks in advance, Egor

ohofmann commented 7 years ago

Ping @brainstorm

brainstorm commented 7 years ago

Hi @egorrevkov, I've been meaning to revisit monitoring in the near future but still have not come to it fully, but our options lined up for testing on the commercial side right now are:

https://www.datadoghq.com/ https://signalfx.com/

Which provide brief trial periods to assess their functionality and ease the heavy lifting/maintenance of deploying custom monitoring solutions such as:

https://aws.amazon.com/blogs/aws/new-cloudwatch-plugin-for-collectd/ https://prometheus.io/

We'll see which comes up first for us since sooner or later we do need to put some "eagles eye view" monitoring in place.

Hope that helps!

erevkov commented 7 years ago

Thanks @brainstorm, this is helpful, will take a look at these.

bcbio / bcbio-nextgen

samblaster: Unable to write to output file. [fputs] Broken pipe #2032