Open roybens opened 5 years ago
@roybens I think this is an issue with your installation of MPI. Can you get mpi4py working at all? Have you tried their tutorials?
When i run: mpiexec -n 5 python -m mpi4py.bench helloworld it works all fine: Hello, World! I am process 0 of 5 on nid00188. Hello, World! I am process 1 of 5 on nid00188. Hello, World! I am process 2 of 5 on nid00188. Hello, World! I am process 3 of 5 on nid00188. Hello, World! I am process 4 of 5 on nid00188. Does this mean mpi4py works?
when i do mpiexec instead of srun i get the following:
(.env) roybens@nid00188:~/tests> mpiexec -n 3 python -m writenwb.py
1
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
/global/homes/r/roybens/.conda/envs/.env/bin/python: Error while finding module specification for 'writenwb.py' (AttributeError: module 'writenwb' has no attribute 'path')
2
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
/global/homes/r/roybens/.conda/envs/.env/bin/python: Error while finding module specification for 'writenwb.py' (AttributeError: module 'writenwb' has no attribute 'path')
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
/global/homes/r/roybens/.conda/envs/.env/bin/python: Error while finding module specification for 'writenwb.py' (AttributeError: module 'writenwb' has no attribute 'path')
The mpiexec appears to be working correctly
In trying to replicate this I came across a different bug: #993
I can not reproduce that. How did you install the h5py? Did you build them yourself? As far as I know, you need to build the hdf5 lib with --enable-parallel
and --enable-shared
and the h5py with --mpi.
Can you also share the shared object dependencies for h5py? An example whould be:
>>> import h5py
>>> print(h5py.__file__)
'/user/miniconda3/envs/nwb/lib/python3.7/site-packages/h5py-2.9.0.post0-py3.7-linux-x86_64.egg/h5py/__init__.py'
~$ ldd defs.cpython-37m-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffe5d7ed000) libhdf5.so.103 => /home/stamatiad/hdf5-1.10.5/hdf5/lib/libhdf5.so.103 (0x00007f883c33e000) libhdf5_hl.so.100 => /home/stamatiad/hdf5-1.10.5/hdf5/lib/libhdf5_hl.so.100 (0x00007f883c11b000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f883beff000) libc.so.6 => /lib64/libc.so.6 (0x00007f883bb32000) libz.so.1 => /lib64/libz.so.1 (0x00007f883b91c000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f883b718000) libm.so.6 => /lib64/libm.so.6 (0x00007f883b416000) libmpi.so.12 => /usr/local/lib/libmpi.so.12 (0x00007f883af9d000) /lib64/ld-linux-x86-64.so.2 (0x00007f883cb5a000) librt.so.1 => /lib64/librt.so.1 (0x00007f883ad95000)
Do you get the libhdf5.so.103 => /home/stamatiad/hdf5-1.10.5/hdf5/lib/libhdf5.so.103
library to be linked in an anaconda/non-mpi build or to your custom/mpi build?
@roybens is this an issue you see running at NERSC? If so, did you use the module load h5py-parallel
module provided by NERSC or did you build your own h5py? Building h5py with parallel enabled requires a few extra steps at NERSC because you need to make sure that both mpi4py and h5py use the Cray MPI provided by the system. https://docs.nersc.gov/programming/high-level-environments/python/
Yes this was on Cori. I did something similar to this: conda install -n .env -c conda-forge -c clawpack h5py-parallel python=3 (according to Andrew's advise) I can try that cray mpi next.
1) Bug
If you are reporting a bug please provide the following:
Steps to Reproduce
Trying to run the code from the docs for parallel io: https://pynwb.readthedocs.io/en/latest/tutorials/general/advanced_hdf5_io.html?highlight=mpi#parallel-i-o-using-mpi
here is writenwb.py :
ranks are always 0 when reading getting similiar errors
Provide a minimal code snippet here to reproduce this error.
Environment
Please describe your environment according to the following bullet points.
2) Feature Request
If you are requesting a feature please provide the following:
Problem/Use Case
Briefly describe the needed feature as well as the reasoning behind it
Checklist