Closed carstenbauer closed 1 year ago
Update
0.5 ms SnoopPrecompile
875.1 ms ThreadPinning 86.52% compilation time
threadinfo(): 0.392547 seconds (288.25 k allocations: 14.257 MiB, 2.54% gc time, 86.13% compilation time)
pinthreads(:compact): 0.294755 seconds (15.00 k allocations: 824.445 KiB, 99.43% compilation time)
getcpuids(): 0.000213 seconds (401 allocations: 34.688 KiB)
pinthread(0): 0.002098 seconds (144 allocations: 9.680 KiB, 97.99% compilation time)
getcpuid(): 0.001744 seconds (31 allocations: 1.531 KiB, 99.62% compilation time)
ncores(): 0.017941 seconds (897 allocations: 44.359 KiB, 99.93% compilation time)
1.670321 seconds (2.60 M allocations: 135.224 MiB, 1.99% gc time, 88.84% compilation time)
Used SnoopPrecompile to improve precompilation / runtime a little bit (see below). Moved
maybe_gather_sysinfo()
(now replaced byupdate_sysinfo!
) to__init__()
so that once the package is loaded (using ThreadPinning
) we can be certain to have the system information of the currently used system/node.Test:
Before the PR:
With the PR:
The overall gain is nice but not huge (expected because dynamically parsing lscpu output takes a good chunk of the time). Most importantly though, we improved the time it takes to run
threadinfo
andpinthreads
and co at the cost of increasedusing ThreadPinning
time.