bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Pipeline ignores cores setting #135

Closed lbeltrame closed 11 years ago

lbeltrame commented 11 years ago

Despite my cluster having each machine with 24 cores, the latest git master from bcbio-nextgen forces aligners and all other bits to run on 2 cores.

I've set cores for bwa and samtools in the YAML configuration file, here posted for reference:


resources:
  log:
    dir: log
  ucsc_bigwig:
    memory: 36G
  bwa:
    cmd: bwa
    cores: 24
  samtools:
    memory: 36G
    cores: 24
  novoalign:
    cores: 8
    memory: 36G
  gemini:
    cores: 24
  gatk:
    jvm_opts: ["-Xms750m", "-Xmx32G"]
    dir: /mnt/data/programs/gatk/
  picard:
    dir: /mnt/data/programs/picard/
  snpEff:
    jvm_opts: ["-Xms750m", "-Xmx32G"]
    dir: /mnt/data/software/snpeff/
  bcbio_variation:
    dir: /mnt/data/programs/bcbio.variation
  mutect:
    jvm_opts: ["-Xms750m", "-Xmx32G"]
    dir: /mnt/data/programs/muTect/
  varscan:
    dir: /mnt/data/programs/varscan/
    jvm_opts: ["-Xms750m", "-Xmx32G"]

When starting the aligner, I get this log bit (formatted for clarity):

013-10-08 10:31:29.594 [IPClusterStart] Config changed:
2013-10-08 10:31:29.595 [IPClusterStart] {'SGELauncher': {'queue': 'main.q'}, '
BcbioSGEEngineSetLauncher': {'mem': '46', 'cores': 2, 'pename': u'orte', 'resources': ''}, 
'IPClusterEngines': {'early_shutdown': 240}, 'Application': {'log_level': 10},
'ProfileDir': {'location': u'/mnt/data/projects/OV159/work/log/ipython'},
'BaseParallelApplication': {'log_to_file': True,
'cluster_id': u'f5c4299b-f5b0-4a57-8620-0cc8598ff5a8'}, 
'IPClusterStart': {'delay': 10, 'controller_launcher_class': u'cluster_helper.cluster.BcbioSGEControllerLauncher', 'daemonize': True, 
'engine_launcher_class': u'cluster_helper.cluster.BcbioSGEEngineSetLauncher',
 'n': 6}}

Notice the "cores" section. It's 2 despite the settings above.

lbeltrame commented 11 years ago

The reason lies in the memory scaling functions, so it's a user error, and not an issue in the pipeline. However a warning messages when the total memory needed by user-set memory per core is greater than the system's memory would immediately pinpoint the issue.

chapmanb commented 11 years ago

Luca; Sorry about the issues. I re-wrote two parts of the documentation to emphasize this more so it will hopefully be less of a problem for other users. I'm not sure a warning will be especially useful here since it's fairly common to be scaling this to match specifications. Generally I'd like to make the documentation clearer that you don't need to tell bcbio-nextgen exactly what to do, but rather specify the expected usage and let it do the rest. Happy to adjust things more to help make this clearer.

lbeltrame commented 11 years ago

Sorry about the issues. I re-wrote two parts of the documentation to emphasize this more so it will hopefully be less of a problem for other

Thanks Brad: I suggested a warning merely because at first it wasn't clear why cores were dropping despite me having ample resources to use (I ended up putting pdb calls to see what was going on). The new wording is much clearer now.