Open ashutoshraman opened 4 days ago
Hi, 1- There is an example of that in the SDK, UNet-highres-demo. The example by default gets images from the camera and loads them to the CNN accelerator to run the inference. The result is shown as a mask on the TFT display. However, to better understand the data format in and out of CNN, you can enable the serial mode. If you comment out the #define USE_CAMERA in the main.c, it uses a serial interface to load an image, and to receive the result. Use the Python script, SerialLoader.py script in the Utility folder in this configuration to send the image and receive and display the mask, e.g.: $ python SerialLoader.py image1.png COM5 115200. You need to restart the evkit when prompted to communicate with the script.
The image to be loaded to the CNN is 352x352, with 3 bytes for RGB = 371712 bytes. It is folded by 4 and loaded as 48 channels of 88x88 data to the CNN, 48x88x88=371712.
The result is 4 bytes per pixel, each byte representing the likelihood of one of the 4 output mask classes. It is read from CNN in 64 channels of 88x88 data = 64x88x88 = 4x352x352.
2- This is exactly what the serial mode with the Python script does. Please see the above and README in the Utility folder.
@MaximGorkem, Please respond to the other questions.
Hi, I have been using these repositories and MAX78000 FTHR_RevA kit to conduct a semantic segmentation task for eventual embedded system application. I have a couple questions regarding the general flow of getting the model onto the chip and running inference on the chip.
For chip inference, I have watched the videos for kws demo on the MAX78000 chip and the chip takes auditory input and predicts a keyword. I can see the classification output and prediction thru my serial port. Now I am doing my own semantic segmentation task, and am trying to figure out how to send images (in the correct format already) to the chip, which already contains my UNet model. How do I do that? I cannot find examples anywhere in the documentation showing this.
Also, is there a method for getting predicted masks back onto my computer from the chip after the image has had inference run on it by the chip's embedded model, so that I can display comparisons with the original image and a GT mask. This is necessary for me to show valid comparisons of the chips ability to segment, and I also want to know how it is performing form a standpoint other than accuracy, since segmentation often requires different metrics than accuracy to show true efficacy.
I wanted clarification on what Camvid Unet is doing if it has 3 classes labeled with keywords. Is it still using the U-net to do pixel wise classification (semantic segmentation) of pictures to segment 3 classes and background out of them? Are masks being given as GT, or is it just labels, like a classification task. In that case, why are they using a UNet for classification?
I saw that to interleave operations, I must create a passthrough layer for skip connections in my network yaml file. If I create this passthrough layer, will I need to also implement a passthrough layer in my actual model python code? Are there examples of that passthrough layer in the model anywhere? the Unets dont show anything to my understanding, and if I run AIsegment large UNet with the aisegment fakept network yaml, it does not run because it says the yaml doesn't match the model.
Thank you in advance, and I apologize for the numerous questions, this chip is great, and I just want to make it work for segmentation purposes