bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

bam_clean issue #1302

Closed parlar closed 8 years ago

parlar commented 8 years ago

Hi!

Using bam files as input with the bam_clean: picard setting in the config file produces an error when bcbio_nextgen is run (see below). Maybe I'm doing something wrong here, or is this a bug?

[genetik@v01s979 work]$ bcbio_nextgen.py -n 30 ../config/config.yaml 
[2016-04-04T07:21Z] Resource requests: sambamba, samtools; memory: 2.00, 2.00; cores: 16, 16
[2016-04-04T07:21Z] Configuring 1 jobs to run, using 16 cores each with 32.1g of memory reserved for each job
[2016-04-04T07:21Z] Timing: organize samples
[2016-04-04T07:21Z] multiprocessing: organize_samples
[2016-04-04T07:21Z] Using input YAML configuration: /home/genetik/irina/bams/calling_2016.04.01-14.32.17/config/config.yaml
[2016-04-04T07:21Z] Checking sample YAML configuration: /home/genetik/irina/bams/calling_2016.04.01-14.32.17/config/config.yaml
[2016-04-04T07:21Z] Testing minimum versions of installed programs
[2016-04-04T07:21Z] Timing: alignment preparation
[2016-04-04T07:21Z] multiprocessing: prep_align_inputs
[2016-04-04T07:21Z] multiprocessing: disambiguate_split
[2016-04-04T07:21Z] Timing: alignment
[2016-04-04T07:21Z] multiprocessing: process_alignment
[2016-04-04T07:21Z] Timing: callable regions
[2016-04-04T07:21Z] multiprocessing: prep_samples
[2016-04-04T07:21Z] Prepare cleaned BED file : bams/11-7401_S6_bam
[2016-04-04T07:21Z] Traceback (most recent call last):
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/bedutils.py", line 16, in <module>
[2016-04-04T07:21Z]     from bcbio.variation import vcfutils
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/vcfutils.py", line 19, in <module>
[2016-04-04T07:21Z]     from bcbio.pipeline import config_utils, shared, tools
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/shared.py", line 9, in <module>
[2016-04-04T07:21Z]     import pybedtools
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/__init__.py", line 12, in <module>
[2016-04-04T07:21Z]     from . import contrib
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/__init__.py", line 4, in <module>
[2016-04-04T07:21Z]     from . import long_range_interaction
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/long_range_interaction.py", line 7, in <module>
[2016-04-04T07:21Z]     import pandas
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/__init__.py", line 7, in <module>
[2016-04-04T07:21Z]     from pandas import hashtable, tslib, lib
[2016-04-04T07:21Z]   File "pandas/src/numpy.pxd", line 157, in init pandas.hashtable (pandas/hashtable.c:38262)
[2016-04-04T07:21Z] ValueError: numpy.dtype has the wrong size, try recompiling
[2016-04-04T07:21Z] Prepare merged BED file : bams/11-7401_S6_bam
[2016-04-04T07:21Z] Prepare cleaned BED file : bams/12-304_S8_bam
[2016-04-04T07:21Z] Traceback (most recent call last):
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/bedutils.py", line 16, in <module>
[2016-04-04T07:21Z]     from bcbio.variation import vcfutils
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/vcfutils.py", line 19, in <module>
[2016-04-04T07:21Z]     from bcbio.pipeline import config_utils, shared, tools
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/shared.py", line 9, in <module>
[2016-04-04T07:21Z]     import pybedtools
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/__init__.py", line 12, in <module>
[2016-04-04T07:21Z]     from . import contrib
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/__init__.py", line 4, in <module>
[2016-04-04T07:21Z]     from . import long_range_interaction
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/long_range_interaction.py", line 7, in <module>
[2016-04-04T07:21Z]     import pandas
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/__init__.py", line 7, in <module>
[2016-04-04T07:21Z]     from pandas import hashtable, tslib, lib
[2016-04-04T07:21Z]   File "pandas/src/numpy.pxd", line 157, in init pandas.hashtable (pandas/hashtable.c:38262)
[2016-04-04T07:21Z] ValueError: numpy.dtype has the wrong size, try recompiling
[2016-04-04T07:21Z] Prepare merged BED file : bams/12-304_S8_bam
[2016-04-04T07:21Z] Prepare cleaned BED file : bams/13-6849_S1_bam
[2016-04-04T07:21Z] Traceback (most recent call last):
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/bedutils.py", line 16, in <module>
[2016-04-04T07:21Z]     from bcbio.variation import vcfutils
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/vcfutils.py", line 19, in <module>
[2016-04-04T07:21Z]     from bcbio.pipeline import config_utils, shared, tools
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/shared.py", line 9, in <module>
[2016-04-04T07:21Z]     import pybedtools
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/__init__.py", line 12, in <module>
[2016-04-04T07:21Z]     from . import contrib
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/__init__.py", line 4, in <module>
[2016-04-04T07:21Z]     from . import long_range_interaction
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/long_range_interaction.py", line 7, in <module>
[2016-04-04T07:21Z]     import pandas
[2016-04-04T07:21Z]   File "/home/genetik/bcbio/share/bcbio/anaconda/lib/python2.7/site-packages/pandas/__init__.py", line 7, in <module>
[2016-04-04T07:21Z]     from pandas import hashtable, tslib, lib
[2016-04-04T07:21Z]   File "pandas/src/numpy.pxd", line 157, in init pandas.hashtable (pandas/hashtable.c:38262)
[2016-04-04T07:21Z] ValueError: numpy.dtype has the wrong size, try recompiling
[2016-04-04T07:21Z] Prepare merged BED file : bams/13-6849_S1_bam
[2016-04-04T07:21Z] Prepare cleaned BED file : bams/14-5639_S4_bam
[...]

Here is an exerpt from the config file:


---
upload:
  dir: ../final
details:
  - files: /home/genetik/irina/bams/11-7401_S6.bam
    description: bams/11-7401_S6.bam
    metadata:
      batch: batch1
      sex: male
    analysis: variant2
    genome_build: hg19
    algorithm:
      aligner: false
      bam_clean: picard
      mark_duplicates: false
      variantcaller: [mutect2, varscan, scalpel, freebayes]
      min_allele_fraction: 1
      quality_format: Standard
      coverage_interval: amplicon
      recalibrate: false
      realign: gatk
      variant_regions: /home/genetik/irina/myeloid_targets.s.m.bed
      clinical_reporting: true
      effects: false
      ensemble:
        numpass: 1
chapmanb commented 8 years ago

Pär; Sorry about the install issue. Is this a slightly older version of bcbio? There was a problem late last year with this due to having incompatible versions of numpy packages (there was a switch from numpy 1.9 to 1.10). There is a workaround here which should hopefully get your packages up to date and in sync:

https://github.com/chapmanb/bcbio-nextgen/issues/1105#issuecomment-155372386

Hope this fixes it for you. If not, feel free to re-open and we can investigate more.

parlar commented 8 years ago

Hi!

I updated to the latest devel release of bcbio-nextgen. The error looks somewhat different now: no complaints about numpy this time. Instead, it complains about bcbio.variation.bedutils.remove_bad(x) and AttributeError: 'module' object has no attribute 'bedutils'.

I guess this does not have anything to do with numpy?

[2016-04-04T14:00Z] Checking sample YAML configuration: /home/genetik/irina/bams/calling_2016.04.01-14.32.17/config/config.yaml
[2016-04-04T14:00Z] Testing minimum versions of installed programs
[2016-04-04T14:00Z] Timing: alignment preparation
[2016-04-04T14:00Z] multiprocessing: prep_align_inputs
[2016-04-04T14:00Z] multiprocessing: disambiguate_split
[2016-04-04T14:00Z] Timing: alignment
[2016-04-04T14:00Z] multiprocessing: process_alignment
[2016-04-04T14:00Z] Timing: callable regions
[2016-04-04T14:00Z] multiprocessing: prep_samples
[2016-04-04T14:00Z] Prepare cleaned BED file : bams/11-7401_S6_bam
[2016-04-04T14:00Z] Traceback (most recent call last):
[2016-04-04T14:00Z]   File "<string>", line 1, in <module>
[2016-04-04T14:00Z]     bcbio.variation.bedutils.remove_bad(x)
[2016-04-04T14:00Z] AttributeError: 'module' object has no attribute 'bedutils'
[2016-04-04T14:00Z] Prepare merged BED file : bams/11-7401_S6_bam
[2016-04-04T14:00Z] Prepare cleaned BED file : bams/12-304_S8_bam
[2016-04-04T14:00Z] Traceback (most recent call last):
[2016-04-04T14:00Z]   File "<string>", line 1, in <module>
[2016-04-04T14:00Z]     bcbio.variation.bedutils.remove_bad(x)
[2016-04-04T14:00Z] AttributeError: 'module' object has no attribute 'bedutils'
[2016-04-04T14:00Z] Prepare merged BED file : bams/12-304_S8_bam
[2016-04-04T14:00Z] Prepare cleaned BED file : bams/13-6849_S1_bam
[2016-04-04T14:00Z] Traceback (most recent call last):
[2016-04-04T14:00Z]   File "<string>", line 1, in <module>
[2016-04-04T14:00Z]     bcbio.variation.bedutils.remove_bad(x)
[2016-04-04T14:00Z] AttributeError: 'module' object has no attribute 'bedutils'
[2016-04-04T14:00Z] Prepare merged BED file : bams/13-6849_S1_bam
[2016-04-04T14:00Z] Prepare cleaned BED file : bams/14-5639_S4_bam
[2016-04-04T14:00Z] Traceback (most recent call last):
[2016-04-04T14:00Z]   File "<string>", line 1, in <module>
[2016-04-04T14:00Z]     bcbio.variation.bedutils.remove_bad(x)
[...]
chapmanb commented 8 years ago

Sorry about the continued problems. This looks like you've gotten a working set of numpy libraries now and moved past the initial error. The new error looks like you're getting an old version of bcbio without the bedutils module. This could either be due to having your PYTHONPATH or PYTHONHOME set to an older version. Is it possible you're exporting these or have another version of bcbio on the system that might cause the conflict? Hope this helps.

parlar commented 8 years ago

PYTHONPATH was indeed set. Now everything seems to work, thanks for your help!