dankamongmen / libtorque

A threaded, continuations-based I/O event library for manycore NUMA machines
http://dank.qemfd.net/dankwiki/index.php/Libtorque
Apache License 2.0
71 stars 3 forks source link

AMD Threadripper 3970x not properly detected #1

Open dankamongmen opened 3 years ago

dankamongmen commented 3 years ago

We get to leaf 0x80000006 and our EDX is quite clearly 04009140, as verified with cpuid -r: edx=0x04009140. According to the AMD CPUID guide Revision 2.34,

CPUID Fn8000_0006_EDX L3 Cache Identifiers
Bits Description
31:18 L3Size: L3 cache size. Specifies the L3 cache size is within the following range:
 (L3Size[31:18] * 512KB)  L3 cache size < ((L3Size[31:18]+1) * 512KB).
17:16 Reserved.
15:12 L3Assoc: L3 cache associativity. L3 cache associativity. See Table 4

Table 4:

Table 4: L2/L3 Cache and TLB Associativity Field Definition
Associativity
[3:0]
Definition
0h L2/L3 cache or TLB is disabled.
1h Direct mapped.
2h 2-way associative.
4h 4-way associative.
6h 8-way associative.
8h 16-way associative.
Ah 32-way associative.
Bh 48-way associative.
Ch 64-way associative.
Dh 96-way associative.
Eh 128-way associative.
Fh Fully associative.
All other encodings are reserved.

so i don't see how 9 is possibly valid there; the L3 on 3970x is 16-way, so i'd expect 8. what's up?

dankamongmen commented 3 years ago

mask: 9 size: 134217728 lsize: 64 assoc: 0 lines: 2097152

so everything looks right except the mask (and thus assoc).

dankamongmen commented 3 years ago

cpuid gets the 16 right, despite decoding 0x80000006 the same way, and having a filter that would print "reserved" for 9. what's up?

dankamongmen commented 3 years ago

actually, no, cpuid just gets this wrong where we're looking:

   L3 cache information (0x80000006/edx):
      line size (bytes)     = 0x40 (64)
      lines per tag         = 0x1 (1)
      associativity         = 0x9 (9)
      size (in 512KB units) = 0x100 (256)

it gets it right later, using "Cache Properties" (0x80000001):

      (synth size)                    = 524288 (512 KB)
      --- cache 3 ---
      type                            = unified (3)
      level                           = 0x3 (3)
      self-initializing               = true
      fully associative               = false
      extra cores sharing this cache  = 0x7 (7)
      line size in bytes              = 0x40 (64)
      physical line partitions        = 0x1 (1)
      number of ways                  = 0x10 (16)
      number of sets                  = 16384
      write-back invalidate           = true
      cache inclusive of lower levels = false
      (synth size)                    = 16777216 (16 MB)