analogdevicesinc / ai8x-synthesis

Quantization and Synthesis (Device Specific Code Generation) for ADI's MAX78000 and MAX78002 Edge AI Devices
Apache License 2.0
55 stars 47 forks source link

Passthrough connections in Unet #297

Closed kirilllzaitsev closed 1 year ago

kirilllzaitsev commented 1 year ago

Based on the Camvid example and the corresponding camvid-unet-large.yaml, having a Unet with 3 concatenation operations your implementation suggests using a single passthrough layer at the bottleneck:

  # Layer 7: pt
  - in_offset: 0x5000
    out_offset: 0x4004
    processors: 0x00ffffffffffffff
    output_processors: 0x00ffffffffffffff
    operation: None
    write_gap: 1
    in_sequences: [5]
  # Layer 8: upconv3
  - in_offset: 0x6000
    out_offset: 0x4000
    processors: 0x00ffffffffffffff
    output_processors: 0x00ffffffffffffff
    operation: convtranspose2d
    kernel_size: 3x3
    pad: 1
    activate: None
    write_gap: 1
    in_sequences: [6]
  # Layer 9: dec3
  - out_offset: 0x2000
    in_offset: 0x4000
    processors: 0x00ffffffffffffff
    output_processors: 0x00ffffffffffffff
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    activate: ReLU
    in_sequences: [8, 7]

Could you explain the reasoning behind not adding passthrough layers to do every concatenation in this network? In which cases do I need to use camvid-unet-large-fakept.yaml & izer/add_fake_passthrough.py instead?

Having the following Unet definition for a regression task, could you help to understand why multiple passthrough layers won't allow it to properly run inference:

---
arch: unetmedium
dataset: customdataset

layers:
  # Layer 0: enc1
  - out_offset: 0x4000
    processors: 0x0000.0000.0000.0007
    data_format: HWC
    output_processors: 0x0f00.0000.0000.0000
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    activate: ReLU
  # Layer 1: enc2
  - out_offset: 0x4000
    processors: 0x0f00.0000.0000.0000
    output_processors: 0x0000.0000.0000.ff00
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    max_pool: 2
    pool_stride: 2
    activate: ReLU
  # Layer 2: enc3
  - out_offset: 0x0000
    processors: 0x0000.0000.0000.00ff
    output_processors: 0xffff.ffff.0000.0000
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    max_pool: 2
    pool_stride: 2
    activate: ReLU
  # Layer 3: bneck
  - out_offset: 0x6000
    processors: 0xffff.ffff.0000.0000
    output_processors: 0xffff.ffff.ffff.ffff
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    max_pool: 2
    pool_stride: 2
    activate: ReLU
  # Layer 4: pt
  - in_offset: 0x0000
    out_offset: 0x4000
    processors: 0xffff.ffff.0000.0000
    output_processors: 0xffff.ffff.0000.0000
    operation: None
    write_gap: 1
    in_sequences: [2]
  # Layer 5: upconv3
  - in_offset: 0x6000
    out_offset: 0x4004
    processors: 0xffff.ffff.ffff.ffff
    output_processors: 0x0000.0000.ffff.ffff
    operation: convtranspose2d
    kernel_size: 3x3
    pad: 1
    activate: None
    write_gap: 1
    in_sequences: [3]
  # Layer 6: dec3
  - in_offset: 0x4000
    out_offset: 0x2000
    processors: 0xffff.ffff.ffff.ffff
    output_processors: 0x0fff.ffff.ffff.ffff
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    activate: ReLU
    in_sequences: [5, 4]
  # Layer 7: pt
  - in_offset: 0x4000
    out_offset: 0x4000
    processors: 0x000.0000.0000.0ff00
    output_processors: 0x000.0000.0000.0ff00
    operation: None
    write_gap: 1
    in_sequences: [1]
  # Layer 8: upconv2
  - in_offset: 0x2000
    out_offset: 0x4004
    processors: 0x0fff.ffff.ffff.ffff
    output_processors: 0x0000.0000.0000.00ff
    operation: convtranspose2d
    kernel_size: 3x3
    pad: 1
    write_gap: 1
    in_sequences: [6]
    activate: None
  # Layer 9: dec2
  - out_offset: 0x2000
    in_offset: 0x4000
    processors: 0x0000.0000.0000.ffff
    output_processors: 0x0000.ffff.ffff.ffff
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    activate: ReLU
    in_sequences: [8, 7]
  # Layer 10: pt
  - in_offset: 0x4000
    out_offset: 0x0000
    processors: 0x0f00.0000.0000.0000
    output_processors: 0x0f00.0000.0000.0000
    operation: None
    write_gap: 1
    in_sequences: [0]
    name: pt3
  # Layer 11: upconv1
  - in_offset: 0x2000
    out_offset: 0x0004
    processors: 0x0000.ffff.ffff.ffff
    output_processors: 0x00f0.0000.0000.0000
    operation: convtranspose2d
    kernel_size: 3x3
    pad: 1
    write_gap: 1
    activate: None
    in_sequences: [9]
  # Layer 12: dec1
  - in_offset: 0x0000
    out_offset: 0x4000
    processors: 0x0ff0.0000.0000.0000
    output_processors: 0x0000.ffff.ffff.ffff
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    activate: ReLU
    in_sequences: [11, 10]
  # Layer 13: dec0
  - out_offset: 0x0000
    processors: 0x0000.ffff.ffff.ffff
    output_processors: 0x0000.0000.ffff.ffff
    operation: conv2d
    kernel_size: 3x3
    pad: 1
    activate: ReLU
  # Layer 14: conv
  - out_offset: 0x4000
    processors: 0x0000.0000.ffff.ffff
    output_processors: 0x0000000000000001
    operation: conv2d
    kernel_size: 1x1
    pad: 0
    activate: None
MaximGorkem commented 1 year ago

The output of the layer "enc3" is used for 2 purposes. The first one is as an input to the "bottleneck" layer, the second one is for concatenation before the "dec3". Since the output channel of it is 64, we need to have a write_gap for the channel-wise concatenation. However, we did not define write_gap to the enc3 directly as it is also used by the bottleneck layer. So, we defined a passthrough layer just to take the data without a gap and write it with write_gap=1 for the concatenation purpose.

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has been open for over 30 days with no activity. It will be closed automatically in 10 days unless a comment is added or the "Stale" label is removed.