amnsbr / cubnm

A toolbox for biophysical network modeling on GPUs
https://cubnm.readthedocs.io
BSD 3-Clause "New" or "Revised" License
10 stars 3 forks source link

[BUG] Import error with installing from wheels: `undefined symbol: shm_open` #3

Open amnsbr opened 11 months ago

amnsbr commented 11 months ago

On some Linux machines the wheels can be installed successfully from pip (therefore are compatible with manylinux_2_28) but on import cuBNM raise the following error:

<path-to-virtual-env>/lib/python3.10/site-packages/cuBNM/core.cpython-310-x86_64-linux-gnu.so: undefined symbol: shm_open

In current implementation any ImportError including undefined symbol in the error message are followed by a recommendation to build the package from source, but the root of the problem needs to be fixed.

Also the original ImportError should be raised instead of being printed.

amnsbr commented 4 months ago

Library "rt" was previously (in version 0.0.1) removed to prevent errors caused by auditwheel in cibuildwheel. Without -lrt pre-built wheels were working correctly on Juseless, Colab and Kaggle but they were not working on the FZJ and MPI supercomputers in addition to an experimental Docker container on my Mac, and an additional report of shm_open which I received in an email a few days ago. In these cases (except last one which I'm waiting to hear from them) adding "rt" to the list of included libraries fixed the issue. Therefore I decided to add it back to the list of libraries in setup.py. I also tested it with cibuildwheel and the wheels were created successfully (but I will not share new wheels until releasing v0.0.3).

I still am not entirely sure why shm_open is needed, and why it works on some systems but not some others. There could be differences in the operating systems for example.

amnsbr commented 4 months ago

Update: Adding "rt" per se doesn't fix this issue. I realized this while testing the Docker container which I'm working on now. On Juseless (for anyone reading this, our institute's cluster) I am building local wheels for Python 3.9 within the container sameli/manylinux2014_x86_64_cuda_11.8, which is the same baseline container used for toolbox's development Docker container. These wheels are not audited by auditwheel, and can be installed and work on Juseless. But when I installed them inside the Docker container, I still got the shm_open error (even though that wheel was build with -lrt). However, after running auditwheel -v repair dist/cubnm-0.0.2-cp39-cp39-linux_x86_64.whl --plat manylinux2014_x86_64 the resulting repaired wheel works on the Docker container. Interestingly after removing -lrt from the compilation of original wheel and repairing it, the repaired wheel (without -lrt) still works. But still since adding -lrt in some instances has fixed this issue I will keep it. For the record, I also attached the output of auditwheel repair which shows what it did to the wheel that made it work. I still don't fully understand this issue, but I think this was an important lead. This issue might still happen on different platforms that I haven't tested yet, so I think it's better to keep it open.

auditwheel_output.txt