analogdevicesinc / msdk

Software Development Kit for Analog Device's MAX-series microcontrollers
Apache License 2.0
61 stars 80 forks source link

help finding model loader in CNN example #618

Closed nikky4D closed 1 year ago

nikky4D commented 1 year ago

I'm not very familiar with c so can someone point out where the model is loaded in this CNN model example: Examples/MAX78002/CNN/pascalvoc-retinanetv7_3

I don't understand the inference pipeline. I would like to attempt to change the model and load the image but I'm confused as to how to go about it with this example.

Jake-Carter commented 1 year ago

Hi @nikky4D I would suggest starting in our AI Documentation repo here.

I also recommend working through the quick-start for the MAX78002EVKIT first. There is an ML-focused section at the end which will help put things into better context. Especially the video series.

To directly answer your question: The C code for the model is generated by the ai8x-synthesis tool. The model itself is saved as a weights.h header file that is loaded from flash into the CNN accelerator by the cnn_load_weights function.

I'm happy to help clarify if you have specific questions about the example

nikky4D commented 1 year ago

Thank you @Jake-Carter . I've worked through several samples in ai8x-synthesis and ai8x-training and trained my models as well. However, I'm still stuck in the inference pipeline.

Precisely, how do I modify the main.c to take in camera input and pass through the model to get the outputs?

Jake-Carter commented 1 year ago

Happy to help!

Fundamentally, you will need to change the implementation of the load_input function. By default, the output of the ai8x-synthesis tool will usually give you a load_input function that looks something like this:

// 1-channel 28x28 data input (784 bytes / 196 32-bit words):
// CHW 28x28, channel 0
static const uint32_t input_0[] = SAMPLE_INPUT_0;

void load_input(void)
{
    // This function loads the sample data input -- replace with actual data
    memcpy32((uint32_t *)0x50400000, input_0, 196);
}

All this does is copy the "Known Answer Test" (KAT) into the data memory of the CNN accelerator for its input layer.

In practice, you will need to interface with real-world sensors to collect data, reformat it if necessary to match the input the model expects, then load it into the accelerator. For most image processing models you will probably use the input FIFO for this. (see Camera Streaming Guide for more details). However, where you do this does not change. You will always need to load the input data from inside the load_input function.

For generic examples on how to use the camera, see the CameraIF and ImgCapture examples.

For a CNN example that uses the camera as input the digit-detection-demo is a good reference.

Jake-Carter commented 1 year ago

Closing this out, let us know if you have any other questions