cnr-ibf-pa / hbp-bsp-issues

Ticketing system for developers/testers and power users of the Brain Simulation Platform of the Human Brain Project
4 stars 0 forks source link

BUG: optimization jobs end with error on DAINT #491

Closed lbologna closed 4 years ago

lbologna commented 5 years ago

Expected behavior

The optimization jobs submitted to CSCS-DAINT ends successfully

Actual Behavior

The optimization jobs submitted to CSCS-DAINT give an error when loading the bpopt module, after setting the module path, in that it cannot find the modulefile for 'cray-python/2.7.13.1'. Here are the commands executed on DAINT and the given error:

export MODULEPATH=/users/bp000178/ich002/software/daint/local-20181022101238/share/modules:/opt/cray/pe/perftools/7.1.1/modulefiles:/opt/cray/ari/modulefiles:/opt/cray/pe/craype/2.6.1/modulefiles:/apps/daint/modulefiles:/apps/daint/system/modulefiles:/apps/daint/UES/easybuild/modulefiles:/apps/daint/UES/reframe:/apps/common/system/modulefiles:/opt/cray/pe/modulefiles:/opt/cray/modulefiles:/opt/modulefiles:/opt/cray/craype/default/modulefiles MODULEPATH=/users/bp000178/ich002/software/daint/local-20181022101238/share/modules:/opt/cray/pe/perftools/7.1.1/modulefiles:/opt/cray/ari/modulefiles:/opt/cray/pe/craype/2.6.1/modulefiles:/apps/daint/modulefiles:/apps/daint/system/modulefiles:/apps/daint/UES/easybuild/modulefiles:/apps/daint/UES/reframe:/apps/common/system/modulefiles:/opt/cray/pe/modulefiles:/opt/cray/modulefiles:/opt/modulefiles:/opt/cray/craype/default/modulefiles module load bpopt /opt/cray/pe/modules/3.2.11.3/bin/modulecmd bash load bpopt cray-python(3):ERROR:105: Unable to locate a modulefile for 'cray-python/2.7.13.1'

Steps to reproduce the problem

Launch an optimization job via either the:

antonelepfl commented 5 years ago

Please check that the path is the same that is mentioned in https://github.com/BlueBrain/spack/wiki/Piz-Daint-Deployment#using-py-bluepy-or-bluepyopt

Otherwise inform HPC because it's better to have only ONE source of truth (In this case I propose the wiki mentioned above)

clupascu commented 4 years ago

There is actually a discussion in this jira tickethttps://bbpteam.epfl.ch/project/issues/browse/BBPP42-534 as well.

clupascu commented 4 years ago

@alex4200 I solved the out of memory by increasing the memory limit per compute node on unicore. I updated my Build and Rebuild notebooks on DEV. Can you please test and let me know? Thanks.

clupascu commented 4 years ago

Unicore increased the default memory from 2000 to 58174. Now everything works fine. @alex4200 can you please test so we can close this ticket?

lbologna commented 4 years ago

@alex4200 The issue seems to be solved for the HHNB as well, for both "daint" and "daint-service account" submissions. Could you please test in DEV before putting to PROD?

Here's the link: https://collab.humanbrainproject.eu/#/collab/1256/nav/29703

Thank you

lbologna commented 4 years ago

Thanks @alex4200 for testing. Moved HHNB to prod.