Open Trzs opened 10 months ago
Importing the modules somehow changes the affinity:
In [1]: def get_affinity():
...: for line in open('/proc/self/status'):
...: if 'Cpu' in line:
...: print(line)
...: return
...:
In [2]: get_affinity()
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-127
In [3]: import boost_python_meta_ext
In [4]: get_affinity()
Cpus_allowed: 00000000,00000000,00000000,00000001
Cpus_allowed_list: 0
Logging the changes with strace libtbx.python dummy.py > trace.log 2>&1
, something is changing the affinity:
[...]
sched_getaffinity(334914, 16, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]) = 16
[...]
sched_setaffinity(334914, 16, [0]) = 0
[...]
Can you list your packages? I copied your get_affinity
test into a file and I do not get the change in affinity with a newly created environment with cctbx-base
on one of our servers.
test.py
def get_affinity():
for line in open('/proc/self/status'):
if 'Cpu' in line:
print(line)
return
if __name__ == '__main__':
get_affinity()
import boost_python_meta_ext
get_affinity()
[bkpoon@anaconda:tmp] conda create -n py39 cctbx-base python=3.9
[bkpoon@anaconda:tmp] conda activate py39
(py39) [bkpoon@anaconda:tmp] python test.py
Cpus_allowed: ffff,ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-143
Cpus_allowed: ffff,ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-143
It seems this behaviour is caused by OMP_PLACES
and OMP_PROC_BIND
. Unsetting these leads to the expected behavior. These were set for Kokkos.
more info: https://github.com/pytorch/pytorch/issues/49971 https://github.com/OpenMathLib/OpenBLAS/issues/2238
The core issue seems to be a bug when OMP_PLACES
is set to threads
. As far as I know, I am not using OpenBLAS, but the same bug might occur in some other library.
Current workaround to suppress Kokkos warnings: export OMP_PLACES=threads
and export OMP_PROC_BIND=false
Interaction of these settings with MPI is still an open question.
On Perlmutter and friends, run_tests_parallel runs tests in parallel, but all tests are run on just one core.
I created a small reproducer that narrows it down to certain module imports.
main script:
dummy.py (the import are not important, they just to show that python import run fine)
Without the comments, the dummy scripts run on 10 cores. With either one, it's down to one core. Wrapping the import in
os.sched_getaffinity
andos.sched_setaffinity
helps, but is not a real solution.