When running oversubscribed (running multiple Chapel instances on a single node to emulate a multi-locale environment) you really need to set CHPL_RT_OVERSUBSCRIBED=yes to avoid massive slowdowns from each Chapel instance assuming it owns the whole node -- https://chapel-lang.org/docs/master/usingchapel/executing.html#oversubscription
It seems like we should be able to auto-detect this for gasnet-smp (oversubscribed by definition) and gasnet-udp if GASNET_SPAWNFN=L.
The performance hit from not enabling oversubscription is pretty large (table below for hello4), so if we can't do a better job of auto-detecting maybe we should add documentation in more places?
chapcs (24 cores, 48 threads):
locales
default
oversub
1
0.5s
0.3s
2
1.0s
0.4s
4
2.0s
0.5s
8
6.0s
0.8s
16
20.0s
1.6s
32
40.0s
3.0s
64
120.0s
4.5s
128
500.0s
16.0s
Somewhat related: even with CHPL_RT_OVERSUBSCRIBED=yes, oversubscription can still really bog things down on low core-count machines (like my 2 core, 4 thread mac.) Using fifo can help in this case, so we may want to see if there's more we can do to improve the situation for qthreads:
When running oversubscribed (running multiple Chapel instances on a single node to emulate a multi-locale environment) you really need to set
CHPL_RT_OVERSUBSCRIBED=yes
to avoid massive slowdowns from each Chapel instance assuming it owns the whole node -- https://chapel-lang.org/docs/master/usingchapel/executing.html#oversubscriptionIt seems like we should be able to auto-detect this for gasnet-smp (oversubscribed by definition) and gasnet-udp if
GASNET_SPAWNFN=L
.The performance hit from not enabling oversubscription is pretty large (table below for hello4), so if we can't do a better job of auto-detecting maybe we should add documentation in more places?
chapcs (24 cores, 48 threads):
Somewhat related: even with
CHPL_RT_OVERSUBSCRIBED=yes
, oversubscription can still really bog things down on low core-count machines (like my 2 core, 4 thread mac.) Using fifo can help in this case, so we may want to see if there's more we can do to improve the situation for qthreads: