Exceeds buffer capacity

aleczhanshi commented 4 years ago

Hi there,

As I'm playing with different configurations, I've run into ERROR: couldn't map level GlobalBuffer: mapped tile size 428201 exceeds buffer capacity 65536. I've been trying to look into the codebase to figure out what it happens, but it seems a bit hard to figure this out through the code.

Could you briefly explain (hopefully in math) how the mapped tile size, buffer capacity are computed from the problem shape (RSPQCKN) and arch specs (sizeKB, entries, word-bits, instances, etc.).

Here are my problem shape = (R = 7; S = 7; P = 112; Q = 112; C = 3; K = 64; N = 1; Wstride = 2; Hstride = 2;) factors = ("R1 S1 P112 Q1 C1 K1 N1") and arch spec (sizeKB = 128; instances = 1; meshX = 1; word-bits = 16; block-size = 4; read_bandwidth = 16; write_bandwidth = 16;)

Thanks in advance!

angshuman-parashar commented 4 years ago

Factors are multiplicatively cumulative, so to determine the tile size at the Global buffer I'll need to know the factors at all levels inside of the Global buffer as well.

aleczhanshi commented 4 years ago

@angshuman-parashar Thanks. Below are the factors of the global buffer. Is that what we need to compute the tile size?

    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    },

angshuman-parashar commented 4 years ago

No that's not enough. As you can see, that's storage level #4. I need to know factors for levels 0, 1, 2, 3 as well - the product of all of those factors will give you the tile size at level 4. Perhaps that explains why your buffer is overflowing?

aleczhanshi commented 4 years ago

@angshuman-parashar Thanks! What are the equations behind this? For example, is the tile size for level 0 the product of all factors (R, S, P, Q, C, K, N)? For upper levels, could you show me the equation to compute the tile size based on the lower levels and itself? I'm putting all the factors below. Thanks!

    {
      target = 0;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K16 N1";
      permutation = "KRSPQCN";
    }, 
    {
      target = 1;
      type = "temporal";
      factors = "R7 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 2;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 3;
      type = "spatial";
      factors = "R1 S7 P1 Q1 C1 K2 N1";
      permutation = "SKRPQCN";
      split = 0;
    }, 
    {
      target = 3;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    },

angshuman-parashar commented 4 years ago

First calculate each dimension as the product of all factors. E.g., multiplying over all levels (temporal + spatial) from 0 through 4, we get: R = 7, S=7, P=112, Q=8, C=1, K=64, N=1. This gives us the problem- or iteration-space tile at level 4. Next, project this problem-space into the data-spaces (i.e., tensors) to obtain the tile shapes for those spaces. E.g., weights = R*S*C*K = 3,136, outputs = N*K*Q*P = 57,344 and inputs = N*C*(S+(Q-1)*Hstride)*(R+(P-1)*Wstride) = 4,809 (assuming dilation=1), giving us a total of 65,289 entries. You can multiply that by the word size to get the capacity in bytes.

Now I'm curious, because it doesn't match the error message (unless I messed up the math somewhere above). Could you please email or upload the entire .cfg (arch, mapping, everything) so that I can reproduce at my end?

aleczhanshi commented 4 years ago

@angshuman-parashar Thanks for doing the computation! I really appreciate it. The error for this set of parameters below is ERROR: couldn't map level GlobalBuffer: mapped tile size 62153 exceeds buffer capacity 32768. I've done the math and got the same results as you, which is 65289, but it ends up being 62153 instead. Not that much of difference but any clue why this is the case?

arch : 
{
  arithmetic : 
  {
    name = "MACs";
    instances = 256;
    word-bits = 16;
    meshX = 16;
  };
  storage = ( 
    {
      name = "PsumRegFile";
      entries = 16;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "WeightRegFile";
      entries = 192;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "InputRegFile";
      entries = 12;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "DummyBuffer";
      entries = 0;
      instances = 16;
      meshX = 16;
      word-bits = 16;
    }, 
    {
      name = "GlobalBuffer";
      sizeKB = 64;
      instances = 1;
      meshX = 1;
      word-bits = 16;
      block-size = 4;
      read_bandwidth = 16;
      write_bandwidth = 16;
    }, 
    {
      name = "DRAM";
      technology = "DRAM";
      instances = 1;
      word-bits = 16;
    } );
};

problem : 
{
  R = 7;
  S = 7;
  P = 112;
  Q = 112;
  C = 3;
  K = 64;
  N = 1;
  Wstride = 2;
  Hstride = 2;
};

mapping = (
    {
      target = 0;
      type = "datatype";
      keep = [ "Outputs" ];
      bypass = [ "Weights", "Inputs" ];
    }, 
    {
      target = 1;
      type = "datatype";
      keep = [ "Weights" ];
      bypass = [ "Inputs", "Outputs" ];
    }, 
    {
      target = 2;
      type = "datatype";
      keep = [ "Inputs" ];
      bypass = [ "Weights", "Outputs" ];
    }, 
    {
      target = 3;
      type = "datatype";
      keep = [ ];
      bypass = [ "Weights", "Inputs", "Outputs" ];
    }, 
    {
      target = 4;
      type = "datatype";
      keep = [ "Inputs", "Outputs" ];
      bypass = [ "Weights" ];
    }, 
    {
      target = 5;
      type = "datatype";
      keep = [ "Weights", "Inputs", "Outputs" ];
      bypass = [ ];
    }, 
    {
      target = 0;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K16 N1";
      permutation = "KRSPQCN";
    }, 
    {
      target = 1;
      type = "temporal";
      factors = "R7 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 2;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 3;
      type = "spatial";
      factors = "R1 S7 P1 Q1 C1 K2 N1";
      permutation = "SKRPQCN";
      split = 0;
    }, 
    {
      target = 3;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 
    {
      target = 5;
      type = "temporal";
      factors = "R1 S1 P1 Q14 C3 K1 N1";
      permutation = "CQKRSPN";
    }
);

aleczhanshi commented 4 years ago

@angshuman-parashar Another question is, I assume that the permutation will not affect the tile size, is it true?

Further, I guess that only those non-one factors will count in the permutation in terms of performance implications. For example, if I have R1 S1 P1 Q8 C1 K2 N1, only the order of Q and K affects the performance because other factors are all ones. In other words, {QK}RSPCN should be same as RSPCN{QK}, and also {QK}PCNRS. Is it correct?

angshuman-parashar commented 4 years ago

Re. your earlier question: Look at the bypass settings. Weights are being bypassed at that level. 65289 - 62153 = 3136, which is the weight tile :).

Re. your most recent question: Correct, permutation does not affect size. And correct, permutations of only non-unit factors affect performance/energy efficiency. In fact, this is something that the mapper exploits to prune the search space.

aleczhanshi commented 4 years ago

@angshuman-parashar Thanks! It makes a lot of sense. I really appreciate it!

agarwal-ayushi commented 2 years ago

Hi @aleczhanshi and @angshuman-parashar : I am facing a similar issue while trying to convert the mapper output map.txt file to .yaml format for the timeloop-model. I am specifically working on the tutorial example: timeloop-accelergy-exercises/workspace/exercises/2020.ispass/timeloop/06-mapper-convlayer-eyeriss

For the mapping given in ref-output: timeloop-mapper.map.txt: here Motivation for my work: I want to use sparse-opt in the timeloop-model on a particular mapping to study impact of sparsity. timeloop-model uses map.yaml. Hence, this effort. I wrote a map.yaml file:

mapping: - target: DRAM type: temporal factors: Q=4 M=4 C=8 P=1 R=1 S=1 N=1 permutation: CMQPRSN

- target: shared_glb type: temporal factors: M=4 P=56 Q=1 R=1 S=1 C=1 N=1 permutation: QMPRSCN

- target: shared_glb type: spatial factors: Q=14 M=1 P=1 C=1 R=1 S=1 N=1 permutation: QMPCRSN split: 1

- target: DummyBuffer type: temporal factors: Q=1 M=1 C=1 S=1 P=1 R=1 N=1 permutation: MSCQPRN

- target: DummyBuffer type: spatial factors: Q=1 C=4 S=3 P=1 R=1 N=1 M=1 permutation: PRNMQSC split: 4

- target: ifmap_spad type: temporal factors: Q=1 M=1 C=1 S=1 P=1 R=1 N=1 permutation: CMQSPRN

- target: weights_spad type: temporal factors: R=3 C=4 N=1 S=1 P=1 Q=1 M=1 permutation: CRNSPQM

- target: psum_spad type: temporal factors: M=16 R=1 C=1 N=1 S=1 P=1 Q=1 permutation: MRCNSPQ

However when I run: timeloop-model arch/eyeriss_like.yaml arch/components/*.yaml prob/VGG02_layer5.yaml trial_map.yaml I get this error: I have been unable to figure out the problem in my mapping. Any help would be great. No other files have been modified.

Sparse optimization configuration complete. ERROR: couldn't map level psum_spad: mapped tile size 33 exceeds buffer capacity 16

NVlabs / timeloop

Exceeds buffer capacity #23