analogdevicesinc / ai8x-synthesis

Quantization and Synthesis (Device Specific Code Generation) for ADI's MAX78000 and MAX78002 Edge AI Devices
Apache License 2.0
55 stars 47 forks source link

Resnet support? #249

Closed hyunjongL closed 2 years ago

hyunjongL commented 2 years ago

Hi, I am curious if this can support models like resnet, where a layer adds an output from the last layer and one from some layers before. I am not sure if the synthesis could cover this. Or models like FCN-8 and this (https://github.com/dvu4/CarND-Semantic-Segmentation).

rotx-maxim commented 2 years ago

There are ResNet-like sample models included with the code (for example, see models/ai85net-res-simplenet.py). Note that these are not a full "official" ResNet as that would exceed the available parameter count on MAX78000. The README.md has a description of how residuals work in hardware, see "Residual Connections".

hyunjongL commented 2 years ago

Thanks for the quick response. It is a very long document haha and I think I missed it.

Also, are there any pointers I can look into to understand the dataflow or operation flow of the 64 processing elements (or processor)? I want to know how should I optimize inputs and models for the hardware. @rotx-maxim Thanks in advance!

rotx-maxim commented 2 years ago

I don't know whether we have a comprehensive dataflow optimization guide (but as you pointed out, there's a lot of documentation available). In short, there are 64 processors and it's best to use all of them. For example, 64 channels is making better use of the hardware than 60 channels. Going beyond 64 channels, the processors are doing a multi-pass, and again, multiples of 64 are better than any other count. Looking at the small scale, since 4 channels share a 32-bit data word, it's best to use multiples of four channels (since the 32-bit word has to be accessed either way). Last, the hardware optimization will give the most improvements for full 2d convolutions with 3x3 kernels (while 1x1 kernels or conv1d, or - on MAX78002 - depthwise convolutions - are supported. there is relatively more data movement in relation to the compute).