NVlabs / timeloop

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
https://timeloop.csail.mit.edu/
BSD 3-Clause "New" or "Revised" License
335 stars 102 forks source link

Regarding 3d-CNN-laye #173

Closed jerryxucheng closed 1 year ago

jerryxucheng commented 1 year ago

Hi, Thanks for providing the infrastructure.

When I'm trying to simulate 3d cnn layers, the timeloop simulator reports a core dumped error.
MKH651FK8A_`MT9{)7UBIB

The shape of the Con3d is: P% ` 4LLU5UCA8I0OW2M6 X

And I write it to yaml as: problem: shape: name: "3d-CNN-Layer" dimensions: [C,K,R,S,T,N,Q,P,F] coefficients:

I'm just simply using the eyeriss_like folder in example_designs besides this 3d-cnn layer configuration. And I also tested the provided vgg layer without changing other yamls and it works.

The command I use is:

timeloop-mapper ../../layer_shapes/CONV/resnet3d/conv1.yaml                 arch/components/*.yaml \ 
                arch/eyeriss_like.yaml \
                constraints/*.yaml     \
                mapper/mapper.yaml

from eyeriss_like folder, where conv1.yaml is pasted and all other yamls are not modified.

Could you help figure out what's wrong with my yaml definition? Thanks a lot!

angshuman-parashar commented 1 year ago
  1. Because we work with several different repositories it is not trivial for us to immediately find the specific YAMLs you are referring to. It would be convenient and less error-prone if you could paste your inputs together into a single YAML for us to reproduce the problem.

  2. In general when you create a new problem shape you need to create a new set of constraints to describe how your hardware can or cannot map the problem. However, in this specific case your problem spec seems to be a strict superset of the Conv2D shape, and so the existing constraints should be syntactically correct. So I don't believe that to be the cause of the segfault. Nevertheless, you should walk through the constraints carefully and think about what they imply given your new problem shape.

jerryxucheng commented 1 year ago
  1. Because we work with several different repositories it is not trivial for us to immediately find the specific YAMLs you are referring to. It would be convenient and less error-prone if you could paste your inputs together into a single YAML for us to reproduce the problem.
  2. In general when you create a new problem shape you need to create a new set of constraints to describe how your hardware can or cannot map the problem. However, in this specific case your problem spec seems to be a strict superset of the Conv2D shape, and so the existing constraints should be syntactically correct. So I don't believe that to be the cause of the segfault. Nevertheless, you should walk through the constraints carefully and think about what they imply given your new problem shape.

Sorry. The yaml files should be like:

problem:
shape:
name: "3d-CNN-Layer"
dimensions: [C,K,R,S,T,N,Q,P,F]
coefficients:
- name: Wstride
default: 1
- name: Hstride
default: 1
- name: Dstride
default: 1
- name: Wdilation
default: 1
- name: Hdilation
default: 1
- name: Ddilation
default: 1
data-spaces:
- name: Weights
projection:
- [ [C] ]
- [ [K] ]
- [ [R] ]
- [ [S] ]
- [ [T] ]
- name: Inputs
projection:
- [ [N] ]
- [ [C] ]
- [ [R, Wdilation], [P, Wstride] ] # SOP form: RWdilation + PWstride
- [ [S, Hdilation], [Q, Hstride] ] # SOP form: SHdilation + QHstride
- [ [T, Ddilation], [F, Dstride] ] # SOP form: TDdilation + FDstride
- name: Outputs
projection:
- [ [N] ]
- [ [K] ]
- [ [Q] ]
- [ [P] ]
- [ [F] ]
read-write: True
instance:
C: 64
K: 64
R: 1
S: 1
T: 1
N: 1
P: 256
Q: 256
F: 30
Wdilation: 1
Wstride: 1
Hdilation: 1
Hstride: 1
Ddilation: 1
Dstride: 1

architecture:
  # ============================================================
  # Architecture Description
  # ============================================================
  version: 0.3
  subtree:
    - name: system
      local:
        - name: DRAM
          class: DRAM
          attributes:
            type: LPDDR4
            width: 64
            block-size: 4
            word-bits: 16
      subtree:
        - name: eyeriss
          attributes:
            technology: 45nm
          local:
            - name: shared_glb
              class: smartbuffer_SRAM
              attributes:
                memory_depth: 16384
                memory_width: 64
                n_banks: 32
                block-size: 4
                word-bits: 16
                read_bandwidth: 16
                write_bandwidth: 16
            - name: DummyBuffer[0..13] # for better mapping
              class: regfile
              attributes:
                depth: 16
                width: 16
                word-bits: 16
                block-size: 1
                meshX: 14
          subtree:
          - name: PE[0..167]
            local:
              - name: ifmap_spad
                class: smartbuffer_RF
                attributes:
                  memory_depth: 12
                  memory_width: 16
                  block-size: 1
                  word-bits: 16
                  meshX: 14
                  read_bandwidth: 2
                  write_bandwidth: 2
              - name: weights_spad
                class: smartbuffer_RF
                attributes:
                  memory_depth: 192
                  memory_width: 16
                  block-size: 1
                  word-bits: 16
                  meshX: 14
                  read_bandwidth: 2
                  write_bandwidth: 2
              - name: psum_spad
                class: smartbuffer_RF
                attributes:
                  memory_depth: 16
                  memory_width: 16
                  update_fifo_depth: 2
                  block-size: 1
                  word-bits: 16
                  meshX: 14
                  read_bandwidth: 2
                  write_bandwidth: 2
              - name: mac
                class: intmac
                attributes:
                  datawidth: 16
                  meshX : 14

compound_components:
  version: 0.3
  classes:
  - name: smartbuffer_RF
    attributes:
      technology: 45nm
      memory_depth: 12
      memory_width: 16
      n_rdwr_ports: 2
      n_banks: 1
      n_buffets: 1
    subcomponents:
      - name: storage
        class: regfile
        attributes:
          technology: technology
          width: memory_width
          depth: memory_depth
          n_rdwr_ports: n_rdwr_ports
          n_banks: n_banks
      - name: address_generators[0..1]
        class: intadder
        attributes:
          technology: technology
          width: log(memory_depth)
    actions:
      - name: write
        arguments:
          data_delta: 0..1
          address_delta: 0..n_banks
        subcomponents:
          - name: storage
            actions:
              - name: write
                arguments:
                  data_delta: data_delta
                  address_delta: address_delta
          - name: address_generators[0]
            actions:
              - name: add
          - name: address_generators[1]
            actions:
              - name: idle
      - name: read
        arguments:
          data_delta: 0..1
          address_delta: 0..n_banks
        subcomponents:
          - name: storage
            actions:
              - name: read
                arguments:
                  data_delta: data_delta
                  address_delta: address_delta
          - name: address_generators[1]
            actions:
              - name: add
          - name: address_generators[0]
            actions:
              - name: idle
      - name: idle
        subcomponents:
          - name: storage
            actions:
              - name: idle
          - name: address_generators[0..1]
            actions:
              - name: idle

compound_components:
  version: 0.3
  classes:
  - name: smartbuffer_SRAM
    attributes:
      technology: 45nm
      memory_depth: 12
      memory_width: 16
      n_rdwr_ports: 2
      n_banks: 1
      n_buffets: 1
    subcomponents:
      - name: storage
        class: SRAM
        attributes:
          technology: technology
          width: memory_width
          depth: memory_depth
          n_rdwr_ports: n_rdwr_ports
          n_banks: n_banks
      - name: address_generators[0..1]
        class: intadder
        attributes:
          technology: technology
          width: log(memory_depth)
    actions:
      - name: write
        arguments:
          data_delta: 0..1
          address_delta: 0..n_banks
        subcomponents:
          - name: storage
            actions:
              - name: write
                arguments:
                  data_delta: data_delta
                  address_delta: address_delta
          - name: address_generators[0]
            actions:
              - name: count
          - name: address_generators[1]
            actions:
              - name: idle
      - name: read
        arguments:
          data_delta: 0..1
          address_delta: 0..n_banks
        subcomponents:
          - name: storage
            actions:
              - name: read
                arguments:
                  data_delta: data_delta
                  address_delta: address_delta
          - name: address_generators[1]
            actions:
              - name: add
          - name: address_generators[0]
            actions:
              - name: idle
      - name: idle
        subcomponents:
          - name: storage
            actions:
              - name: idle
          - name: address_generators[0..1]
            actions:
              - name: idle

architecture_constraints:
  targets:
  # certain buffer only stores certain datatypes
  - target: psum_spad
    type: bypass
    bypass: [Inputs, Weights]
    keep: [Outputs]
  - target: weights_spad
    type: bypass
    bypass: [Inputs, Outputs]
    keep: [Weights]
  - target: ifmap_spad
    type: bypass
    bypass: [Weights, Outputs]
    keep: [Inputs]
  - target: DummyBuffer
    type: bypass
    bypass: [Inputs, Outputs, Weights]
  - target: shared_glb
    type: bypass
    bypass: [Weights]
    keep: [Inputs, Outputs]
  - target: DummyBuffer
    type: spatial
    split: 4
    permutation: NPQR SCM
    factors: N=1 P=1 Q=1 R=1 S=0
  # only allow fanout of M, Q out from glb
  - target: shared_glb
    type: spatial
    split: 7
    permutation: NCPRSQM
    factors: N=1 C=1 P=1 R=1 S=1
  # one ofmap position but of different output channels
  - target: psum_spad
    type: temporal
    permutation: NCPQRS M
    factors: N=1 C=1 R=1 S=1 P=1 Q=1
  # row stationary -> 1 row at a time
  - target: weights_spad
    type: temporal
    permutation: NMPQS CR
    factors: N=1 M=1 P=1 Q=1 S=1 R=0
  - target: ifmap_spad
    type: temporal
    permutation: NMCPQRS
    factors: N=1 M=1 C=1 P=1 Q=1 R=1 S=1
  # enforce the hardware limit of the bypassing everything
  - target: DummyBuffer
    type: temporal
    factors: N=1 M=1 C=1 P=1 Q=1 R=1 S=1

mapspace_constraints:
  targets:
    # intuitive optimization to reduce map space size
    # the factors of these are 1 anyways, so the order does not really matter
    - target: DummyBuffer
      type: temporal
      permutation: NMCPQRS
    # intuitive optimization for row stationary
    # -> process a row/col of the output before going to the next one
    - target: shared_glb
      type: temporal
      permutation: QRSC PNM
      factors: Q=1 R=1 S=1 P=0
    # intuitive optimization to reduce map space size
    - target: DRAM
      type: temporal
      permutation: RSP CMNQ
      factors: R=1 S=1 P=1

mapper:
  optimization-metrics: [ delay, energy ]
  live-status: False
  num-threads: 8
  timeout: 15000
  victory-condition: 3000
  algorithm: random-pruned
  max-permutations-per-if-visit: 16

And the zip file of the eyeriss folder I use without the net: eyeriss_like.zip

Thanks a lot!

angshuman-parashar commented 1 year ago

The YAML indentation is completely off in the problem spec you pasted. Also the constraints use the variable name M for output channels, but the problem shape uses K.

jerryxucheng commented 1 year ago

The YAML indentation is completely off in the problem spec you pasted. Also the constraints use the variable name M for output channels, but the problem shape uses K.

Sorry I'm new in this area. The indentation off results from the paste, it actually has the indentation: And after I change K to M, the segmentation fault still exists. Sorry for bothering you, but are there any more mistakes I'm making here?

problem:
  shape:
    name: "3d-CNN-Layer"
    dimensions: [C,M,R,S,T,N,Q,P,F]
    coefficients:
    - name: Wstride
      default: 1
    - name: Hstride
      default: 1
    - name: Dstride
      default: 1
    - name: Wdilation
      default: 1
    - name: Hdilation
      default: 1
    - name: Ddilation
      default: 1
  data-spaces:
    - name: Weights
      projection:
      - [ [C] ]
      - [ [M] ]
      - [ [R] ]
      - [ [S] ]
      - [ [T] ]
    - name: Inputs
      projection:
      - [ [N] ]
      - [ [C] ]
      - [ [R, Wdilation], [P, Wstride] ] # SOP form: RWdilation + PWstride
      - [ [S, Hdilation], [Q, Hstride] ] # SOP form: SHdilation + QHstride
      - [ [T, Ddilation], [F, Dstride] ] # SOP form: TDdilation + FDstride
    - name: Outputs
      projection:
      - [ [N] ]
      - [ [M] ]
      - [ [Q] ]
      - [ [P] ]
      - [ [F] ]
      read-write: True
  instance:
    C: 64
    M: 64
    R: 1
    S: 1
    T: 1
    N: 1
    P: 256
    Q: 256
    F: 30
    Wdilation: 1
    Wstride: 1
    Hdilation: 1
    Hstride: 1
    Ddilation: 1
    Dstride: 1
angshuman-parashar commented 1 year ago

Could be another copy-paste error, but the key data-spaces is mis-indented in the above YAML. But that's an easy fix. In future it may be easier to just attach the complete file.

After fixing the indent, I was able to run timeloop-mapper successfully, so unfortunately I could not reproduce the segfault. Could you help us out a little by digging deeper into what line of code caused the segfault? Rebuild timeloop using scons --d, then run it with gdb and let us know what you see.

jerryxucheng commented 1 year ago

Could be another copy-paste error, but the key data-spaces is mis-indented in the above YAML. But that's an easy fix. In future it may be easier to just attach the complete file.

After fixing the indent, I was able to run timeloop-mapper successfully, so unfortunately I could not reproduce the segfault. Could you help us out a little by digging deeper into what line of code caused the segfault? Rebuild timeloop using scons --d, then run it with gdb and let us know what you see.

It works after fixing the indent. The problem actually results from the M,K difference and this mis-indent. Silly me, sorry for bothering you this long and I'm really thankful for your help.

angshuman-parashar commented 1 year ago

Please don't apologize. We should have better error messages. But anyway, glad this was resolved.