carstenbauer / ThreadPinning.jl

Readily pin Julia threads to CPU-threads
https://carstenbauer.github.io/ThreadPinning.jl/
MIT License
110 stars 7 forks source link

`threadinfo` doesn't work on Fugaku #13

Closed giordano closed 2 years ago

giordano commented 2 years ago
julia> threadinfo(; )
┌ Warning: Could read `lscpu --all --extended` but number of cpuids doesn't match Sys.CPU_THREADS. Falling back to defaults.
└ @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/h5Fae/src/utility.jl:95
ERROR: MethodError: no method matching +(::SubString{String}, ::Int64)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at /vol0003/ra000019/data/a04463/julia-1.7.2-aarch64/share/julia/base/operators.jl:655
  +(::T, ::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} at /vol0003/ra000019/data/a04463/julia-1.7.2-aarch64/share/julia/base/int.jl:87
  +(::LinearAlgebra.UniformScaling, ::Number) at /vol0003/ra000019/data/a04463/julia-1.7.2-aarch64/share/julia/stdlib/v1.7/LinearAlgebra/src/uniformscaling.jl:145
  ...
Stacktrace:
 [1] gather_sysinfo_lscpu()
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/h5Fae/src/utility.jl:112
 [2] maybe_gather_sysinfo()
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/h5Fae/src/utility.jl:73
 [3] threadinfo(; blas::Bool, hints::Bool, color::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/h5Fae/src/threadinfo.jl:13
 [4] threadinfo()
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/h5Fae/src/threadinfo.jl:13
 [5] top-level scope
   @ REPL[4]:1

shell> lscpu --all --extended
CPU NODE CLUSTER CORE ONLINE MAXMHZ    MINMHZ
0   0    0       0    yes    2200.0000 1600.0000
1   1    1       1    yes    2200.0000 1600.0000
2   -    -       -    no     -         -
3   -    -       -    no     -         -
4   -    -       -    no     -         -
5   -    -       -    no     -         -
6   -    -       -    no     -         -
7   -    -       -    no     -         -
8   -    -       -    no     -         -
9   -    -       -    no     -         -
10  -    -       -    no     -         -
11  -    -       -    no     -         -
12  4    2       2    yes    2200.0000 1600.0000
13  4    2       3    yes    2200.0000 1600.0000
14  4    2       4    yes    2200.0000 1600.0000
15  4    2       5    yes    2200.0000 1600.0000
16  4    2       6    yes    2200.0000 1600.0000
17  4    2       7    yes    2200.0000 1600.0000
18  4    2       8    yes    2200.0000 1600.0000
19  4    2       9    yes    2200.0000 1600.0000
20  4    2       10   yes    2200.0000 1600.0000
21  4    2       11   yes    2200.0000 1600.0000
22  4    2       12   yes    2200.0000 1600.0000
23  4    2       13   yes    2200.0000 1600.0000
24  5    3       14   yes    2200.0000 1600.0000
25  5    3       15   yes    2200.0000 1600.0000
26  5    3       16   yes    2200.0000 1600.0000
27  5    3       17   yes    2200.0000 1600.0000
28  5    3       18   yes    2200.0000 1600.0000
29  5    3       19   yes    2200.0000 1600.0000
30  5    3       20   yes    2200.0000 1600.0000
31  5    3       21   yes    2200.0000 1600.0000
32  5    3       22   yes    2200.0000 1600.0000
33  5    3       23   yes    2200.0000 1600.0000
34  5    3       24   yes    2200.0000 1600.0000
35  5    3       25   yes    2200.0000 1600.0000
36  6    4       26   yes    2200.0000 1600.0000
37  6    4       27   yes    2200.0000 1600.0000
38  6    4       28   yes    2200.0000 1600.0000
39  6    4       29   yes    2200.0000 1600.0000
40  6    4       30   yes    2200.0000 1600.0000
41  6    4       31   yes    2200.0000 1600.0000
42  6    4       32   yes    2200.0000 1600.0000
43  6    4       33   yes    2200.0000 1600.0000
44  6    4       34   yes    2200.0000 1600.0000
45  6    4       35   yes    2200.0000 1600.0000
46  6    4       36   yes    2200.0000 1600.0000
47  6    4       37   yes    2200.0000 1600.0000
48  7    5       38   yes    2200.0000 1600.0000
49  7    5       39   yes    2200.0000 1600.0000
50  7    5       40   yes    2200.0000 1600.0000
51  7    5       41   yes    2200.0000 1600.0000
52  7    5       42   yes    2200.0000 1600.0000
53  7    5       43   yes    2200.0000 1600.0000
54  7    5       44   yes    2200.0000 1600.0000
55  7    5       45   yes    2200.0000 1600.0000
56  7    5       46   yes    2200.0000 1600.0000
57  7    5       47   yes    2200.0000 1600.0000
58  7    5       48   yes    2200.0000 1600.0000
59  7    5       49   yes    2200.0000 1600.0000

For the record, according to the datasheet, each node has 48 compute cores + 2 or 4 assistant cores for the operating system, my guess is that here 0-1 are the assistant cores and 2-49 are the compute ones.

carstenbauer commented 2 years ago

Thanks for filing the issue. That's an interesting structure. I'll try to submit a fix later today. Would be great if you could then test it (I don't have access to Fugaku).

carstenbauer commented 2 years ago

@giordano What do you get for Sys.CPU_THREADS on the system?

giordano commented 2 years ago

I seem to remember it's 48, but can tell you for sure later today.

carstenbauer commented 2 years ago

Yeah, I wonder whether it's 50 or 48. (I would expect 50.)

carstenbauer commented 2 years ago

I've updated gather_sysinfo_lscpu on ThreadPinning.jl#main to support FUGAKU. Feel free to check it out and confirm that threadinfo works. Of course, it would be great if you could test the other functionality (like pinning) as well. Perhaps even run ] test on Fugaku.

giordano commented 2 years ago

Ok, I was wrong:

julia> Sys.CPU_THREADS
50

julia> Threads.nthreads()
48

julia> threadinfo(; color=false)

| _,_,12,13,14,15,16,17,18,19,20,21,22,23,24,25,
  26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,
  42,43,_,45,46,47,48,49,50,51,52,53,54,55,56,57,
  58,59 |

# = Julia thread, | = Socket seperator

Julia threads: 48
├ Occupied CPU-threads: 47(!)
└ Mapping (Thread => CPUID): 1 => 14, 2 => 57, 3 => 18, 4 => 20, 5 => 16, ...

However:

julia> pinthreads(:compact)
ERROR: BoundsError: attempt to access 50-element Vector{Bool} at index [51]
Stacktrace:
  [1] getindex
    @ ./array.jl:861 [inlined]
  [2] ishyperthread
    @ /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/querying.jl:45 [inlined]
  [3] #84
    @ ./operators.jl:1117 [inlined]
  [4] filter(f::Base.var"#84#85"{typeof(ishyperthread)}, a::Vector{Int64})
    @ Base ./array.jl:2484
  [5] _broadcast_getindex_evalf
    @ ./broadcast.jl:670 [inlined]
  [6] _broadcast_getindex
    @ ./broadcast.jl:643 [inlined]
  [7] getindex
    @ ./broadcast.jl:597 [inlined]
  [8] macro expansion
    @ ./broadcast.jl:961 [inlined]
  [9] macro expansion
    @ ./simdloop.jl:77 [inlined]
 [10] copyto!
    @ ./broadcast.jl:960 [inlined]
 [11] copyto!
    @ ./broadcast.jl:913 [inlined]
 [12] copy
    @ ./broadcast.jl:885 [inlined]
 [13] materialize
    @ ./broadcast.jl:860 [inlined]
 [14] _pin_compact(nthreads::Int64; hyperthreads::Bool)
    @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:93
 [15] _pin_compact
    @ /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:91 [inlined]
 [16] pinthreads(strategy::Symbol; nthreads::Int64, warn::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:64
 [17] pinthreads(strategy::Symbol)
    @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:61
 [18] top-level scope
    @ REPL[9]:1

julia> pinthreads(:spread)
ERROR: ArgumentError: All cpuids must be ≤ Sys.CPU_THREADS-1 and ≥ 0.
Stacktrace:
 [1] pinthreads(cpuids::SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}; warn::Bool)
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:38
 [2] _pin_scatter
   @ /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:103 [inlined]
 [3] pinthreads(strategy::Symbol; nthreads::Int64, warn::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:66
 [4] pinthreads(strategy::Symbol)
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/o0INJ/src/pinning.jl:61
 [5] top-level scope
   @ REPL[10]:1

Side note, I think here I want to pin to the last threads, not the first ones, which should be reserved to the OS.

carstenbauer commented 2 years ago

Ok, thanks for the update, I'll try my best to fix this from the far 😄

Side note, I think here I want to pin to the last threads, not the first ones, which should be reserved to the OS.

You can always provide an AbstractVector with CPU IDs to pinthreads, e.g. pinthreads(2:49) (CPU IDs start with 0). If we want to make the automatic pinthreads(:compact) (and similarly pinthreads(:spread)) work we would need to special case Fugaku (by hostname) or, probably better, the CPU (by name).

carstenbauer commented 2 years ago

Alright, I've pushed fixes to the main branch. Please try them out. (I haven't special cased Fugaku / the CPU yet though so :compact will still use the first two cores.)

giordano commented 2 years ago

Ugh, now I get:

julia> threadinfo(; color=false)                                                                                                                                                                                  
ERROR: BoundsError: attempt to access 6-element Vector{Vector{Int64}} at index [7]                                                                                                                                
Stacktrace:                                                                                                                                                                                                       
 [1] getindex(A::Vector{Vector{Int64}}, i1::Int64)                                                                                                                                                                
   @ Base ./array.jl:861                                                                                                                                                                                          
 [2] gather_sysinfo_lscpu(lscpustr::Nothing; verbose::Bool)                                                                                                                                                       
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/Mt6Gv/src/utility.jl:156
 [3] maybe_gather_sysinfo(lscpustr::Nothing; force::Bool, verbose::Bool)
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/Mt6Gv/src/utility.jl:73
 [4] maybe_gather_sysinfo (repeats 2 times)
   @ /data/ra000019/a04463/julia-depot/packages/ThreadPinning/Mt6Gv/src/utility.jl:71 [inlined]
 [5] threadinfo(; blas::Bool, hints::Bool, color::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/Mt6Gv/src/threadinfo.jl:13
 [6] top-level scope
   @ REPL[6]:1

BTW, I think pinthreads(2:49) would fine here, this is a somewhat special CPU, it may not be worth special-case it.

carstenbauer commented 2 years ago

Hm, that's weird I didn't even change that part (at least not intentionally). Can you post the output of ThreadPinning.gather_sysinfo_lscpu(;verbose=true)?

carstenbauer commented 2 years ago

BTW, I think pinthreads(2:49) would fine here, this is a somewhat special CPU, it may not be worth special-case it.

It would be pinthreads(12:59) since, for whatever reason, the CPU threads are numbered 0,1,12,13,...,59. See your lscpu output or cpuids_all().

giordano commented 2 years ago
julia> ThreadPinning.gather_sysinfo_lscpu(;verbose=true)
online_cpu_tblidcs = [2, 3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61]
cpuids = Any[0, 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
hyperthreading = false
nsockets = 1
nnuma = 6
ThreadPinning.SysInfo
├ nsockets: 1
├ nnuma: 6
├ hyperthreading: false
├ cpuids: [0, 1, 12, 13, 14, 15, 16, 17, 18, 19  …  50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
├ cpuids_sockets: [[0, 1, 12, 13, 14, 15, 16, 17, 18, 19  …  50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]
├ cpuids_numa: [[0], [1], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35], [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]
└ ishyperthread: Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
carstenbauer commented 2 years ago

Ok, so ThreadPinning.gather_sysinfo_lscpu(;verbose=true) works just fine but threadinfo(; color=false) fails? I'm asking once more because the stacktrace suggests that the error is caused by

 [2] gather_sysinfo_lscpu(lscpustr::Nothing; verbose::Bool)                                                                                                                                                       
   @ ThreadPinning /data/ra000019/a04463/julia-depot/packages/ThreadPinning/Mt6Gv/src/utility.jl:156
carstenbauer commented 2 years ago

@giordano I found a typo which could have caused this (though I still don't understand why a direct call succeeds). Please try the new main.

giordano commented 2 years ago

It seems to be working now with latest changes:

julia> ThreadPinning.gather_sysinfo_lscpu(;verbose=true)
online_cpu_tblidcs = [2, 3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61]
cpuids = Any[0, 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
hyperthreading = false
nsockets = 1
nnuma = 6
ThreadPinning.SysInfo
├ nsockets: 1
├ nnuma: 6
├ hyperthreading: false
├ cpuids: [0, 1, 12, 13, 14, 15, 16, 17, 18, 19  …  50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
├ cpuids_sockets: [[0, 1, 12, 13, 14, 15, 16, 17, 18, 19  …  50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]
├ cpuids_numa: [[0], [1], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35], [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]]
└ ishyperthread: Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

julia> threadinfo(; color=false)

| _,_,12,13,14,15,16,17,18,19,20,21,22,23,24,25,
  26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,
  42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,
  58,59 |

# = Julia thread, | = Socket seperator

Julia threads: 48
├ Occupied CPU-threads: 48
└ Mapping (Thread => CPUID): 1 => 12, 2 => 33, 3 => 13, 4 => 14, 5 => 16, ...

julia> pinthreads(:spread)

julia> threadinfo(; color=false)

| _,_,12,13,14,15,16,17,18,19,20,21,22,23,24,25,
  26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,
  42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,
  _,_ |

# = Julia thread, | = Socket seperator

Julia threads: 48
├ Occupied CPU-threads: 46(!)
└ Mapping (Thread => CPUID): 1 => 14, 2 => 33, 3 => 12, 4 => 13, 5 => 14, ...

julia> pinthreads(:compact)

julia> threadinfo(; color=false)

| _,_,12,13,14,15,16,17,18,19,20,21,22,23,24,25,
  26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,
  42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,
  _,_ |

# = Julia thread, | = Socket seperator

Julia threads: 48
├ Occupied CPU-threads: 46(!)
└ Mapping (Thread => CPUID): 1 => 14, 2 => 33, 3 => 12, 4 => 13, 5 => 14, ...

julia> threadinfo(; color=false)

| _,_,12,13,14,15,16,17,18,19,20,21,22,23,24,25,
  26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,
  42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,
  58,59 |

# = Julia thread, | = Socket seperator

Julia threads: 48
├ Occupied CPU-threads: 48
└ Mapping (Thread => CPUID): 1 => 12, 2 => 13, 3 => 14, 4 => 15, 5 => 16, ...

Thanks!

carstenbauer commented 2 years ago

FYI: I tagged a new release (0.4.3) with these changes.