Open bchuh opened 2 years ago
Unfortunately this will be hard to diagnose, as the error is coming from the OS and python does not seem to know what the error code corresponds to.
I can only guess that because this is happening when the shared memory is being allocated, and because in some POSIX error mappings, 132 corresponds to EOVERFLOW, that the problem might be that multitables is trying to allocate more shared memory than is allowed on your system.
Multitables defaults to allocating the chunk size of the array, maybe your chunk size is too large? Try setting block_size=1 when calling get_generator to reduce the amount allocated, or increasing the amount of allowed shared memory in your OS settings.
Adding the "block_size=1" arg. does not solve it. I also tried switching to non-parallel training, but the error remains the same. I guess I'm out of luck here, but thanks for replying anyway!
I‘m doing Pytorch distributed data parallel training, and I used a generator to traverse the EArray data in my hdf5 dataset. As soon as the program started, I got an "RuntimeError: Unknown error type: 132 when handling execution of <_FuncPtr object at 0x7fb033f58280> with args (b'/nXmxo38xgBw=', 194, 384)". The code where multitables is involved is shown below.
The error message: