analogdevicesinc / ai8x-synthesis

Quantization and Synthesis (Device Specific Code Generation) for ADI's MAX78000 and MAX78002 Edge AI Devices
Apache License 2.0
55 stars 47 forks source link

out_offset in yaml file #301

Closed jason850124 closed 9 months ago

jason850124 commented 10 months ago

Sorry, I'm the first time to use the MAX78000 EVKit, I can't totally understand how to set the value of the out_offset in yaml file, did I need to calculate it depending on something need to be noticed? Thank you so much!

Best regards, Jason

rotx-maxim commented 10 months ago

Data memory (activation memory) holds the input to a layer as well as its output. Therefore, it's logical that the input has to be consumed before the output is written. The easiest way to do this is to ensure there is no overlap between where the input is stored, and where the output is stored. 'out_offset' is a byte offset into that memory. It defaults to 0 for the input, and has to be specified for all other layers. For very simple networks, it's enough to switch back and forth between 0 and half the data memory (see for example, networks/mnist-chw-ai85.yaml). For networks that use residuals, the output of a layer needs to stick around to be used in a later layer, not just the very next layer. In those cases, enough memory needs to be allocated to hold the intermediate output. If you have the dimensions (and the number of channels modulo 64) then that number is relatively simple to calculate - XYM (where M is the number of passes, 1 for anything between 1 and 64 channels, higher for more than 64 channels). The synthesis tool (izer) will also complain if not enough memory is allocated. There are more examples in the networks/ folder of this repository.

jason850124 commented 9 months ago

Hi! Rotx,

Thanks for your reply, I understand how to set the value of yaml file now, but there is one thing that makes me confused is when the output channels are more than 64, for example the output shape is 128x10x10, and output_offset is 0x2000, in the first 64 channels will occupy the memory to 0x2000+4x10x10, and I wanna know the output_offset of the left 64 channels is following the 0x2000+4x10x10?

rotx-maxim commented 9 months ago

When you use multiple "passes", the data is interleaved (in your example: pass 0/pass1/pass0/pass1/...). The 32-bit word at offset 0x2000 for processors 0-3 will contain data from channels 0, 1, 2, 3. The word at 0x2004 will contain data from channels 64, 65, 66, 67. For example: Starting at H=0 W=0 0x2000: 0xc4e000dd -- channel 0 = -35 (0xdd), channel 1 = 0, channel 2 = -32 (0xe0), channel 3 = -60 (0xc4) 0x2004: 0x4724a9c0 -- channel 64 = -64 (0xc0), channel 65 = -87 (0xa9), channel 66 = 36 (0x24), channel 67 = 71 (0x47) H=0 W=1 0x2008: channels 0/1/2/3 again but W increased by 1 0x200c: channels 64/65/66/67 again but W increased by 1 H=0 W=2 etc.

jason850124 commented 9 months ago

Hi! Rotx,

Thanks for your reply, your explains are totally clearly, I can design my own specific memory now!

Best regards, Jason