NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
174 stars 85 forks source link

Parallel io from the docs #984

Open roybens opened 5 years ago

roybens commented 5 years ago

1) Bug

If you are reporting a bug please provide the following:

Steps to Reproduce

Trying to run the code from the docs for parallel io: https://pynwb.readthedocs.io/en/latest/tutorials/general/advanced_hdf5_io.html?highlight=mpi#parallel-i-o-using-mpi

here is writenwb.py :

from mpi4py import MPI
import numpy as np
from dateutil import tz
from pynwb import NWBHDF5IO, NWBFile, TimeSeries
from datetime import datetime
from hdmf.data_utils import DataChunkIterator

start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz('US/Pacific'))
fname = 'test_parallel_pynwb.nwb'
rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)

# Create file on one rank. Here we only instantiate the dataset we want to
# write in parallel but we do not write any data
if rank == 0:
    nwbfile = NWBFile('aa', 'aa', start_time)
    data = DataChunkIterator(data=None, maxshape=(4,), dtype=np.dtype('int'))

    nwbfile.add_acquisition(TimeSeries('ts_name', description='desc', data=data,
                                       rate=100., unit='m'))
    with NWBHDF5IO(fname, 'w') as io:
        io.write(nwbfile)

# write to dataset in parallel
with NWBHDF5IO(fname, 'a', comm=MPI.COMM_WORLD) as io:
    nwbfile = io.read()
    print(rank)
    nwbfile.acquisition['ts_name'].data[rank] = rank
and in read i replace the last 4 lines with:
 read from dataset in parallel
with NWBHDF5IO(fname, 'r', comm=MPI.COMM_WORLD) as io:
    print(io.read().acquisition['ts_name'].data[rank])

after running:
(.env) roybens@nid00227:~/tests> srun -n 3  python ./writenwb.py
0
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
0
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
0
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
(.env) roybens@nid00227:~/tests> vi writenwb.py
(.env) roybens@nid00227:~/tests> srun -n 3  python ./writenwb.py
0
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
0
/global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
0

ranks are always 0 when reading getting similiar errors

Provide a minimal code snippet here to reproduce this error.

Environment

Please describe your environment according to the following bullet points.

2) Feature Request

If you are requesting a feature please provide the following:

Problem/Use Case

Briefly describe the needed feature as well as the reasoning behind it

Checklist

bendichter commented 5 years ago

@roybens I think this is an issue with your installation of MPI. Can you get mpi4py working at all? Have you tried their tutorials?

roybens commented 5 years ago

When i run: mpiexec -n 5 python -m mpi4py.bench helloworld it works all fine: Hello, World! I am process 0 of 5 on nid00188. Hello, World! I am process 1 of 5 on nid00188. Hello, World! I am process 2 of 5 on nid00188. Hello, World! I am process 3 of 5 on nid00188. Hello, World! I am process 4 of 5 on nid00188. Does this mean mpi4py works?

roybens commented 5 years ago

when i do mpiexec instead of srun i get the following:

(.env) roybens@nid00188:~/tests> mpiexec -n 3 python -m writenwb.py

1 /global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters /global/homes/r/roybens/.conda/envs/.env/bin/python: Error while finding module specification for 'writenwb.py' (AttributeError: module 'writenwb' has no attribute 'path') 2 /global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters /global/homes/r/roybens/.conda/envs/.env/bin/python: Error while finding module specification for 'writenwb.py' (AttributeError: module 'writenwb' has no attribute 'path') /global/homes/r/roybens/.conda/envs/.env/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters /global/homes/r/roybens/.conda/envs/.env/bin/python: Error while finding module specification for 'writenwb.py' (AttributeError: module 'writenwb' has no attribute 'path')

bendichter commented 5 years ago

The mpiexec appears to be working correctly

bendichter commented 5 years ago

In trying to replicate this I came across a different bug: #993

stamatiad commented 5 years ago

I can not reproduce that. How did you install the h5py? Did you build them yourself? As far as I know, you need to build the hdf5 lib with --enable-parallel and --enable-shared and the h5py with --mpi.

Can you also share the shared object dependencies for h5py? An example whould be:

  1. Start python and import h5py, then check its location: >>> import h5py >>> print(h5py.__file__) '/user/miniconda3/envs/nwb/lib/python3.7/site-packages/h5py-2.9.0.post0-py3.7-linux-x86_64.egg/h5py/__init__.py'
  2. Go inside h5py folder and check the defs.something.so: ~$ ldd defs.cpython-37m-x86_64-linux-gnu.so linux-vdso.so.1 => (0x00007ffe5d7ed000) libhdf5.so.103 => /home/stamatiad/hdf5-1.10.5/hdf5/lib/libhdf5.so.103 (0x00007f883c33e000) libhdf5_hl.so.100 => /home/stamatiad/hdf5-1.10.5/hdf5/lib/libhdf5_hl.so.100 (0x00007f883c11b000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f883beff000) libc.so.6 => /lib64/libc.so.6 (0x00007f883bb32000) libz.so.1 => /lib64/libz.so.1 (0x00007f883b91c000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f883b718000) libm.so.6 => /lib64/libm.so.6 (0x00007f883b416000) libmpi.so.12 => /usr/local/lib/libmpi.so.12 (0x00007f883af9d000) /lib64/ld-linux-x86-64.so.2 (0x00007f883cb5a000) librt.so.1 => /lib64/librt.so.1 (0x00007f883ad95000)

Do you get the libhdf5.so.103 => /home/stamatiad/hdf5-1.10.5/hdf5/lib/libhdf5.so.103 library to be linked in an anaconda/non-mpi build or to your custom/mpi build?

oruebel commented 5 years ago

@roybens is this an issue you see running at NERSC? If so, did you use the module load h5py-parallel module provided by NERSC or did you build your own h5py? Building h5py with parallel enabled requires a few extra steps at NERSC because you need to make sure that both mpi4py and h5py use the Cray MPI provided by the system. https://docs.nersc.gov/programming/high-level-environments/python/

roybens commented 5 years ago

Yes this was on Cori. I did something similar to this: conda install -n .env -c conda-forge -c clawpack h5py-parallel python=3 (according to Andrew's advise) I can try that cray mpi next.