RRZE-HPC / kerncraft

Loop Kernel Analysis and Performance Modeling Toolkit
GNU Affero General Public License v3.0
88 stars 24 forks source link

Error with associativity: "ways needs to be a power of 2" #66

Closed sguera closed 5 years ago

sguera commented 6 years ago

When using as a machine file the one generated for the Intel Xeon E5-2640v4:

FLOPs per cycle:
  DP:
    ADD: 4
    FMA: 8
    MUL: 4
    total: 16
  SP:
    ADD: 8
    FMA: 16
    MUL: 8
    total: 32
NUMA domains per socket: 1.0

...

cacheline size:  64 B
clock: 2.47 GHz
compiler:
  clang: -03 -mavx2 -D_POSIX_C_SOURCE=200112L
  gcc: -O3 -march=core-avx2 -D_POSIX_C_SOURCE=200112L
  icc: -O3 -xCORE-AVX2 -fno-alias
cores per NUMA domain: 0.1
cores per socket: 10
memory hierarchy:
- cache per group:
    cl_size: 64
    load_from: L2
    replacement_policy: LRU
    sets: 64
    store_to: L2
    ways: 8
    write_allocate: True
    write_back: True
  cores per group: 1.0
  cycles per cacheline transfer: 1
  groups: 20
  level: L1
  performance counter metrics:
    accesses: MEM_UOPS_RETIRED_LOADS_ALL:PMC[0-3]
    evicts: L2_TRANS_L1D_WB:PMC[0-3]
    misses: L1D_REPLACEMENT:PMC[0-3]
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: k
    unit: B
    value: 32.0
  threads per group: 1.0
- cache per group:
    cl_size: 64
    load_from: L3
    replacement_policy: LRU
    sets: 512
    store_to: L3
    ways: 8
    write_allocate: True
    write_back: True
  cores per group: 1.0
  cycles per cacheline transfer: 2
  groups: 20
  level: L2
  performance counter metrics:
    accesses: L1D_REPLACEMENT:PMC[0-3]
    evicts: L2_TRANS_L2_WB:PMC[0-3]
    misses: L2_LINES_IN_ALL:PMC[0-3]
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: k
    unit: B
    value: 256.0
  threads per group: 1.0
- cache per group:
    cl_size: 64
    replacement_policy: LRU
    sets: 20480
    ways: 20
    write_allocate: True
    write_back: True
  cores per group: 10.0
  cycles per cacheline transfer: INFORMATION_REQUIRED
  groups: 2
  level: L3
  performance counter metrics:
    accesses: L2_LINES_IN_ALL:PMC[0-3]
    evicts: (LLC_VICTIMS_M:CBOX0C[01] + LLC_VICTIMS_M:CBOX1C[01] + LLC_VICTIMS_M:CBOX2C[01] +
               LLC_VICTIMS_M:CBOX3C[01] + LLC_VICTIMS_M:CBOX4C[01] + LLC_VICTIMS_M:CBOX5C[01] +
               LLC_VICTIMS_M:CBOX6C[01] + LLC_VICTIMS_M:CBOX7C[01] + LLC_VICTIMS_M:CBOX8C[01] +
               LLC_VICTIMS_M:CBOX9C[01] + LLC_VICTIMS_M:CBOX10C[01] + LLC_VICTIMS_M:CBOX11C[01] +
               LLC_VICTIMS_M:CBOX12C[01] + LLC_VICTIMS_M:CBOX13C[01] + LLC_VICTIMS_M:CBOX14C[01] +
               LLC_VICTIMS_M:CBOX15C[01] + LLC_VICTIMS_M:CBOX16C[01] + LLC_VICTIMS_M:CBOX17C[01] +
               LLC_VICTIMS_M:CBOX18C[01] + LLC_VICTIMS_M:CBOX19C[01] + LLC_VICTIMS_M:CBOX20C[01] +
               LLC_VICTIMS_M:CBOX21C[01])
    misses: (LLC_LOOKUP_DATA_READ:CBOX0C[01] + LLC_LOOKUP_DATA_READ:CBOX1C[01] +
               LLC_LOOKUP_DATA_READ:CBOX2C[01] + LLC_LOOKUP_DATA_READ:CBOX3C[01] +
               LLC_LOOKUP_DATA_READ:CBOX4C[01] + LLC_LOOKUP_DATA_READ:CBOX5C[01] +
               LLC_LOOKUP_DATA_READ:CBOX6C[01] + LLC_LOOKUP_DATA_READ:CBOX7C[01] +
               LLC_LOOKUP_DATA_READ:CBOX8C[01] + LLC_LOOKUP_DATA_READ:CBOX9C[01] +
               LLC_LOOKUP_DATA_READ:CBOX10C[01] + LLC_LOOKUP_DATA_READ:CBOX11C[01] +
               LLC_LOOKUP_DATA_READ:CBOX12C[01] + LLC_LOOKUP_DATA_READ:CBOX13C[01] +
               LLC_LOOKUP_DATA_READ:CBOX14C[01] + LLC_LOOKUP_DATA_READ:CBOX15C[01] +
               LLC_LOOKUP_DATA_READ:CBOX16C[01] + LLC_LOOKUP_DATA_READ:CBOX17C[01] +
               LLC_LOOKUP_DATA_READ:CBOX18C[01] + LLC_LOOKUP_DATA_READ:CBOX19C[01] +
               LLC_LOOKUP_DATA_READ:CBOX20C[01] + LLC_LOOKUP_DATA_READ:CBOX21C[01])
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: M
    unit: B
    value: 25.0
  threads per group: 10.0
- cores per group: 10
  cycles per cacheline transfer: null
  level: MEM
  penalty cycles per read stream: 0
  size per group: null
  threads per group: 10
micro-architecture: BDW
model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
model type: Intel Xeon Broadwell EN/EP/EX processor
non-overlapping model:
  performance counter metric: T_OL + T_L1L2 + T_L2L3 + T_L3MEM
  ports: ["2D", "3D"]
overlapping model:
  performance counter metric: 
    Max(UMASK_UOPS_EXECUTED_PORT_PORT_0:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_1:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_4:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_5:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_6:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])
  ports: ["0", "0DV", "1", "2", "2D", "3", "3D", "4", "5", "6", "7"]
sockets: 2
threads per core: 1

I get the following error:

Traceback (most recent call last):
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/bin/kerncraft", line 11, in <module>
    load_entry_point('kerncraft==0.5.10', 'console_scripts', 'kerncraft')()
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 295, in main
    run(parser, args)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 259, in run
    model = getattr(models, model_name)(kernel, machine, args, parser)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/models/ecm.py", line 88, in __init__
    self.predictor = CacheSimulationPredictor(self.kernel, self.machine, self.cores)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/cacheprediction.py", line 218, in __init__
    csim = self.machine.get_cachesim(self.cores)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/machinemodel.py", line 71, in get_cachesim
    cs, caches, mem = cachesim.CacheSimulator.from_dict(cache_dict)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 63, in from_dict
    name=name, **{k:v for k,v in conf.items() if k not in ['store_to', 'load_from']})
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 253, in __init__
    assert is_power2(ways), "ways needs to be a power of 2"
AssertionError: ways needs to be a power of 2

In this case L3 has 20 way associativity.

Should I bring it to the closest power of 2 or what?

cod3monk commented 6 years ago

@sguera As a workaround, you can use:

    sets: 25600
    ways: 16

instead of

    sets: 20480
    ways: 20

This should yield the same results in most cases.

cod3monk commented 5 years ago

Fixed with commit fbd388d