Closed bgeltz closed 1 month ago
Check if this is still reproducible.
I think this is expected if the PlatformIO::read_signal() or PlatformIO::write_control() are called through the ServiceIOGroup. The batch interface will not show this issue due to the fact that the forked process (the batch server) is the one that initiates the "client" on the GPU.
This is still happening. On a fresh compute node allocation:
[bgeltz@node0 ~]$ cat /sys/class/drm/card*/clients/*/name
geopmd
geopmd
geopmd
geopmd
geopmd
geopmd
I can remove these clients by restarting the service. They will return upon any invocation of geopmread, geopmwrite, geopmsession, etc.
Open docu tasks for:
opened new issues to deal with the problem
Describe the bug I tried to run an application with GEOPM and I expected the GPUs to show no clients at the conclusion of a run instead a client remains, pointing to "geopmd".
GEOPM version Installed service version: 2.0.0+dev411g47f3f6e3 Userspace runtime version: 4cf0e9596 build.sh invocation:
GEOPM_BASE_CONFIG_OPTIONS="--with-sqlite3=${HOME}/build/sqlite3 --enable-beta" GEOPM_SKIP_SERVICE_INSTALL=yes ./integration/config/build.sh
Expected behavior Run the GPU workload with GEOPM:
Observe that there are no clients on the GPUs at the conclusion of the run:
Actual behavior Every card* directory has one of these entries indicating a
geopmd
process is still on the GPUs:Additional context Restarting the service cleans up the processes on the GPUs.