analogdevicesinc / ai8x-training

Model Training for ADI's MAX78000 and MAX78002 Edge AI Devices
Apache License 2.0
86 stars 79 forks source link

Digit Recognition Demo vs SVHN model difference #326

Closed impulse1992 closed 2 weeks ago

impulse1992 commented 2 months ago

I tried to generate 'c' files of digit recognition demo with SVHN dataset. However, the 'c' code that I get after using training, quantization and generate scripts, the 'c' files that I get have different CNN parameters than actual 'c' code of digit detection demo in final layers. Moreover, when I copy my generated 'c' code in the digit recognition MSDK example, I get weird detection on LCD (Connected with Max78000FTHR).

ermanok commented 1 month ago

Could you please give more details about how you generated the 'c' files and which example code you modified?

impulse1992 commented 1 month ago

I used the following script to generate the C files. The checkpoint and yaml files came with the synthesis repository and I used them unmodified. And finally i copied the cnn.c, cnn.h, log and weights files into the digit_detection_demo example project that came with MSDK.

!/bin/sh

DEVICE="MAX78000" TARGET="sdk/Examples/$DEVICE/CNN" COMMON_ARGS="--device $DEVICE --timer 0 --display-checkpoint --verbose"

python ai8xize.py --test-dir $TARGET --prefix svhn_tinierssd --checkpoint-file trained/ai85-svhn-tinierssd-qat8-q.pth.tar --config-file networks/svhn-tinierssd.yaml --device MAX78000 --compact-data --mexpress --timer 0 --display-checkpoint --verbose --overlap-data --mlator --new-kernel-loader --overwrite --no-unload

I noticed something strange in original cnn.c file that came with digit_detection_demo project. the last 4 layers in original cnn.c have dimensions of 48x18x18, 48x9x9, 48x4x4 and 48x2x2; however, the cnn.c file that I generated with above script have different dimensions of last 4 layers as 44x18x18, 44x9x9, 44x4x4 and 44x2x2. Moreover the following script mentioned in original cnn.c file of digit_detection_demo contains different yaml file which is not provided in repository as well.

// Created using ./ai8xize.py --test-dir sdk/Examples/MAX78000/CNN --prefix tinyssd_svhn_prior_unload --checkpoint-file /home/ermanokman/repos/github/egg_localization/logs/2022.04.07-190512/qat_tinierssd_svhn_q.pth.tar --config-file /home/seldauyanik/GitHub/Eta2/ai8x-training/tinyssd8x.yaml --device MAX78000 --compact-data --mexpress --timer 0 --display-checkpoint --verbose --overlap-data --mlator --new-kernel-loader --overwrite --no-unload

My objective is to use max78000 in real world object detection applications in aquaponic systems. And I am planning to use digit_recognition_demo and tinierssd model as baseline. Thanks

impulse1992 commented 1 month ago

Hi. I just solved the issue by doing these steps.

  1. I modified "train.py" to add 2 background classes instead of 1 in case of object detection models (Line#268). (This made the dimensions of generated cnn.c file similar to original cnn.c file of digit recognition demo i-e 48 x W x H instead of 44 x W x H as the dimensions of last 4 layers of model depend upon number of classes).
  2. I trained the model using modified train.py and quantized the generated checkpoint file with provided scripts.
  3. I modified the "svhn-tinierssd.yaml" to have 48 processors instead of 44 in the last 4 layers to match the dimensions of model.
  4. I generated C files from same script and copied files in actual digit_recognition_demo in MSDK and it worked.

This may be a temporary solution . Hope you can fix this with better solution. Thanks.

the modified files are attached as follows. train.py.txt svhn-tinierssd8x.yaml.txt

ermanok commented 1 month ago

Hi,

What you observe is correct. If you check the svhn_tinierssd folder in the MSDK you can see the model has 44 channels at the last for layers (the class prediction layers). This is the synthesized code of the known answer test (KAT) for the [model] (https://github.com/analogdevicesinc/ai8x-training/blob/develop/models/ai85net-tinierssd.py) in the ai8x-training repo.

The digit detection demo in the MSDK uses a modified version of this model and it seems you have already the realized the modification. This model outputs 12 classes other than 11. This modification was done for ease of unloading the model output to the prior class probability array. You can check the get_prior_cls function at [post_process.c] (https://github.com/analogdevicesinc/msdk/blob/main/Examples/MAX78000/CNN/digit-detection-demo/post_process.c). As each data memory block of the CNN accelerator stores 4 channels of the output data, it is easier to interpret the memory address of each prior's class predictions when the number of classes is a multiple of 4. Besides, each memory block corresponds to a specified prior of the SSD model. So, we trained the model by adding a dummy 12th class, which does not exist in the dataset and never consider this class during the post processing.

github-actions[bot] commented 3 weeks ago

This issue has been marked stale because it has been open for over 30 days with no activity. It will be closed automatically in 10 days unless a comment is added or the "Stale" label is removed.