JuliaSIMD / VectorizationBase.jl

Base library providing vectorization-tools (ie, SIMD) that other libraries are built off of.
MIT License
67 stars 30 forks source link

`L1CACHE.linesize` is `nothing` on WSL2 Ubuntu making LoopVectorization.jl fail to precompile #27

Open hsgg opened 3 years ago

hsgg commented 3 years ago

Not sure if this is a VectorizationBase.jl, LoopVectorization.jl, or Hwloc.jl bug.

L1CACHE.linesize=nothing on my system:

julia> VectorizationBase.L₁CACHE
(size = nothing, depth = nothing, linesize = nothing, associativity = nothing, type = nothing)

This causes LoopVectorization.jl to fail to precompile.

My system is the WSL2, the Windows Subsystem for Linux 2 running Ubuntu-20.04. The /proc/cpuinfo appears normal (happy to post on request), CPU is Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz.

Some more info:

julia> VectorizationBase.CACHE_COUNT
(0, 0, 0, 0)

julia> VectorizationBase.COUNTS
Dict{Symbol,Int64} with 19 entries:
  :Package    => 1
  :Error      => 0
  :PU         => 16
  :OS_Device  => 0
  :L5Cache    => 0
  :L4Cache    => 0
  :I1Cache    => 0
  :L3Cache    => 0
  :Core       => 8
  :Machine    => 1
  :I3Cache    => 0
  :PCI_Device => 0
  :L2Cache    => 0
  :NUMANode   => 0
  :Bridge     => 0
  :Group      => 0
  :Misc       => 0
  :L1Cache    => 0
  :I2Cache    => 0

julia> VectorizationBase.TOPOLOGY
D0: L0 P0 Machine  
    D1: L0 P0 Package  
        D2: L0 P0 Core  
            D3: L0 P0 PU  
            D3: L1 P1 PU  
        D2: L1 P1 Core  
            D3: L2 P2 PU  
            D3: L3 P3 PU  
        D2: L2 P2 Core  
            D3: L4 P4 PU  
            D3: L5 P5 PU  
        D2: L3 P3 Core  
            D3: L6 P6 PU  
            D3: L7 P7 PU  
        D2: L4 P4 Core  
            D3: L8 P8 PU  
            D3: L9 P9 PU  
        D2: L5 P5 Core  
            D3: L10 P10 PU  
            D3: L11 P11 PU  
        D2: L6 P6 Core  
            D3: L12 P12 PU  
            D3: L13 P13 PU  
        D2: L7 P7 Core  
            D3: L14 P14 PU  
            D3: L15 P15 PU  
hsgg commented 3 years ago

Just to follow up, this problem does not occur if I pin VectorizationBase to version 0.12.

chriselrod commented 3 years ago

I think it's a Hwloc bug, but I don't know if it's supposed to work.

VectorizationBase 0.12 used CpuId.jl instead. Maybe I should use both, with CpuId serving as a backup when Hwloc doesn't work. =/

chriselrod commented 3 years ago

FWIW, it should look more like this:

julia> VectorizationBase.CACHE_COUNT
(18, 18, 1, 0)

julia> VectorizationBase.COUNTS
Dict{Symbol, Int64} with 19 entries:
  :L3Cache    => 1
  :I2Cache    => 0
  :Package    => 1
  :Machine    => 1
  :I3Cache    => 0
  :PU         => 36
  :PCI_Device => 0
  :OS_Device  => 0
  :Error      => 0
  :L2Cache    => 18
  :NUMANode   => 0
  :Bridge     => 0
  :L5Cache    => 0
  :Group      => 0
  :Misc       => 0
  :L1Cache    => 18
  :L4Cache    => 0
  :I1Cache    => 0
  :Core       => 18

julia> VectorizationBase.TOPOLOGY
D0: L0 P0 Machine
    D1: L0 P0 Package
        D2: L0 P-1 L3Cache  Cache{size=25952256,depth=3,linesize=64,associativity=11,type=Unified}
            D3: L0 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L0 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L0 P0 Core
                        D6: L0 P0 PU
                        D6: L1 P18 PU
            D3: L1 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L1 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L1 P1 Core
                        D6: L2 P1 PU
                        D6: L3 P19 PU
            D3: L2 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L2 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L2 P2 Core
                        D6: L4 P2 PU
                        D6: L5 P20 PU
            D3: L3 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L3 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L3 P3 Core
                        D6: L6 P3 PU
                        D6: L7 P21 PU
            D3: L4 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L4 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L4 P4 Core
                        D6: L8 P4 PU
                        D6: L9 P22 PU
            D3: L5 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L5 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L5 P8 Core
                        D6: L10 P5 PU
                        D6: L11 P23 PU
            D3: L6 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L6 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L6 P9 Core
                        D6: L12 P6 PU
                        D6: L13 P24 PU
            D3: L7 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L7 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L7 P10 Core
                        D6: L14 P7 PU
                        D6: L15 P25 PU
            D3: L8 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L8 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L8 P11 Core
                        D6: L16 P8 PU
                        D6: L17 P26 PU
            D3: L9 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L9 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L9 P16 Core
                        D6: L18 P9 PU
                        D6: L19 P27 PU
            D3: L10 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L10 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L10 P17 Core
                        D6: L20 P10 PU
                        D6: L21 P28 PU
            D3: L11 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L11 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L11 P18 Core
                        D6: L22 P11 PU
                        D6: L23 P29 PU
            D3: L12 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L12 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L12 P19 Core
                        D6: L24 P12 PU
                        D6: L25 P30 PU
            D3: L13 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L13 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L13 P20 Core
                        D6: L26 P13 PU
                        D6: L27 P31 PU
            D3: L14 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L14 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L14 P24 Core
                        D6: L28 P14 PU
                        D6: L29 P32 PU
            D3: L15 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L15 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L15 P25 Core
                        D6: L30 P15 PU
                        D6: L31 P33 PU
            D3: L16 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L16 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L16 P26 Core
                        D6: L32 P16 PU
                        D6: L33 P34 PU
            D3: L17 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L17 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L17 P27 Core
                        D6: L34 P17 PU
                        D6: L35 P35 PU

Seems that everything pertaining to the cache is missing. While the topology has the 8 cores, they're all nested directly inside the package rather than the caches. =/

hsgg commented 3 years ago

Thanks for the response!

I made a Hwloc.jl bug report here, and have pinned VectorizationBase.jl to version 0.12 as a workaround.

chriselrod commented 3 years ago

Thanks for filling a report there, but the contributors at Hwloc.jl may forward you here: https://github.com/open-mpi/hwloc

Note that VectorizationBase 0.12 doesn't support Julia 1.6. Julia 1.6 won't be out until probably early February, so there's time to find a solutio. Or, failing that, I could implement a workaround.

hsgg commented 3 years ago

Thanks for the heads-up. I appreciate your help!

chriselrod commented 3 years ago

Forgot to update to confirm that LoopVectorization's been using 64 as a default fallback for a while now.