carstenbauer / ThreadPinning.jl

Readily pin Julia threads to CPU-threads
https://carstenbauer.github.io/ThreadPinning.jl/
MIT License
106 stars 7 forks source link

SnoopPrecompile + Refactoring of update sysinfo #18

Closed carstenbauer closed 1 year ago

carstenbauer commented 1 year ago

Used SnoopPrecompile to improve precompilation / runtime a little bit (see below). Moved maybe_gather_sysinfo() (now replaced by update_sysinfo!) to __init__() so that once the package is loaded (using ThreadPinning) we can be certain to have the system information of the currently used system/node.

Test:

@time begin
    @time_imports using ThreadPinning
    @showtime threadinfo()
    @showtime pinthreads(:compact)
    @showtime getcpuids()
    @showtime pinthread(0)
    @showtime getcpuid()
    @showtime ncores()
end

Before the PR:

2.4 ms  ThreadPinning
threadinfo(): 1.639246 seconds (5.44 M allocations: 281.975 MiB, 3.00% gc time, 97.34% compilation time)
pinthreads(:compact): 0.410707 seconds (974.82 k allocations: 49.977 MiB, 3.24% gc time, 99.54% compilation time)
getcpuids(): 0.000181 seconds (399 allocations: 34.625 KiB)
pinthread(0): 0.002169 seconds (653 allocations: 41.452 KiB, 98.21% compilation time)
getcpuid(): 0.001729 seconds (31 allocations: 1.531 KiB, 99.61% compilation time)
ncores(): 0.029779 seconds (95.94 k allocations: 5.013 MiB, 99.95% compilation time)
  2.163364 seconds (6.57 M allocations: 340.419 MiB, 2.89% gc time, 97.14% compilation time)

With the PR:

0.5 ms  SnoopPrecompile
919.5 ms  ThreadPinning 87.73% compilation time
threadinfo(): 0.434676 seconds (686.07 k allocations: 34.318 MiB, 1.35% gc time, 95.83% compilation time)
pinthreads(:compact): 0.266147 seconds (15.00 k allocations: 824.570 KiB, 99.71% compilation time)
getcpuids(): 0.000259 seconds (397 allocations: 34.562 KiB)
pinthread(0): 0.002079 seconds (144 allocations: 9.680 KiB, 98.53% compilation time)
getcpuid(): 0.001734 seconds (31 allocations: 1.531 KiB, 99.64% compilation time)
ncores(): 0.017690 seconds (897 allocations: 44.359 KiB, 99.94% compilation time)
  1.771727 seconds (3.64 M allocations: 188.919 MiB, 1.62% gc time, 89.38% compilation time)

The overall gain is nice but not huge (expected because dynamically parsing lscpu output takes a good chunk of the time). Most importantly though, we improved the time it takes to run threadinfo and pinthreads and co at the cost of increased using ThreadPinning time.

carstenbauer commented 1 year ago

Update

0.5 ms  SnoopPrecompile
875.1 ms  ThreadPinning 86.52% compilation time
threadinfo(): 0.392547 seconds (288.25 k allocations: 14.257 MiB, 2.54% gc time, 86.13% compilation time)
pinthreads(:compact): 0.294755 seconds (15.00 k allocations: 824.445 KiB, 99.43% compilation time)
getcpuids(): 0.000213 seconds (401 allocations: 34.688 KiB)
pinthread(0): 0.002098 seconds (144 allocations: 9.680 KiB, 97.99% compilation time)
getcpuid(): 0.001744 seconds (31 allocations: 1.531 KiB, 99.62% compilation time)
ncores(): 0.017941 seconds (897 allocations: 44.359 KiB, 99.93% compilation time)
  1.670321 seconds (2.60 M allocations: 135.224 MiB, 1.99% gc time, 88.84% compilation time)