NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
273 stars 31 forks source link

pyxis failed to load via dlopen() due to undefined symbol #25

Closed fspiga closed 3 years ago

fspiga commented 4 years ago

Moving from SLURM 19.05.x to 20.02.4 seems to break pyxis due to the following error:

sbatch: error: spank: /usr/local/src/pyxis/spank_pyxis.so: Dlopen of plugin file failed
sbatch: error: spank: /etc/slurm/plugstack.conf.d/pyxis.conf:1: Failed to load plugin /usr/local/src/pyxis/spank_pyxis.so. Aborting.
sbatch: error: Failed to initialize plugin stack

Rebuilding pyxis seems not to fix the issue. Very possible that moving from SLURM 19.x to SLURM 20.x the SPANK API has changed.

Please keep track of which SLURM versions are supported and tested ;)

flx42 commented 4 years ago

Hello,

Which release or commit of pyxis are you using? Because upgrading from 19.x to 20.x should work for pyxis <= 0.7.0

The Slurm API changed in 20.02, and we are relying on this new API on the pyxis 0.8.0 release. That means that for pyxis 0.8.0 you need Slurm >= 20.02, see https://github.com/NVIDIA/pyxis/releases/tag/v0.8.0 (and I will put that in the README too)

lukeyeager commented 4 years ago

Looks to me like you might not have a file at /usr/local/src/pyxis/spank_pyxis.so?