AxisCommunications / acap-native-sdk-examples

Example code for APIs and features in AXIS Camera Application Platform (ACAP) Native SDK
Apache License 2.0
44 stars 24 forks source link

Larod model input #103

Closed hulkds closed 1 year ago

hulkds commented 1 year ago

Hello Axis developers,

I am integrating my own deep learning model on an Axis camera using ACAP Native and have some questions about the Larod library.

In your object detection example, you are using the MobileNet SSD model with the model input is channel last or NHWC (batch size, height, width, channel) and the model is trained on a uint8 [0, 255] image and this works well with Larod and the convertCropScaleU8yuvToRGB function in imgconverter.c.

In my case, I have a pre-trained model with the input order is channel first or NCHW, my model is trained on the float32 [0, 1] image. I converted the input model to uint8 and also used the larodSetTensorLayout function to set the input tensor layout NCHW. My model can work on an Axis camera but its output is not what I expect. When I tested the tflite model on my computer with input order is NCHW, it works as expected. I don't know if the problem is the way Larod reads the input image. That's why I created this issue to hear your ideas.

Here is some information you might need:

I am looking forward to your answer !

Best,

pataxis commented 1 year ago

Hi @hulkds , thanks for reaching out to us. We'll get back to you shortly.

Corallo commented 1 year ago

Hello @hulkds

So when you say that you "converted the input model to uint8" you mean that you quantized your whole model? Did you try the quantized version of the model in your computer as well?

hulkds commented 1 year ago

Hello @Corallo,

Yes, I have quantized my model to uint8, the output is still float32. I also tried my quantized model on my computer, it works just well.

By the way, I saw that my model can run on the camera for both channel first and channel last input. But in both cases, it doesn't give me a good output.

Thanks in advance!

Corallo commented 1 year ago

Larod is expecting the same amount of bytes and can't recognize in which order the pixels are given.

After you set the right channel order do you also modify your input image in the right format? That might be the issue. In the cv25 version of the object detector larod also expects the image in a CHW format, and we have a modified version of the Crop and scale function that can give the output as CHW. (RGB planar). I'd suggest to try this.

Note : Don't use the object-detection-cv25 example on an artpec7 camera, it won't work, you can just borrow that function to help you sort the input in the right order

hulkds commented 1 year ago

Thanks for your response!

I just tried your solution for both "RGB-planar" and "RGB-interleaved" but I still didn't get the result I expected. I am doing an image quality assessment task, so my model is a regression model (input: video frame, output: quality score) but not a detection model. For now, my model can work on the camera but the quality of the blurred frame is the same as the normal frame. When I tried on my computer, the blurred frame gives me a much more lower quality score. I can provide you with the model for testing if you want.

Best,

Corallo commented 1 year ago

Okay. Does it look that the score you get is close to what you would get, giving to input random noise?

I can give you some extra tips to debug your application. You could try the following:

  1. First, update your firmware to the latest version. >= 11.1.
  2. For a single frame, double check that the input that you are giving to larod is not somehow corrupted. To do so, before executing the larodRunInference line, print the content of the input address into a .bin, and read that file with python or any other tool to display the image. To verify that larod gets the expected input.
  3. If (2) works as expected, and you are sure there are no mistakes in your input. Copy that same image.bin file and your model into the camera and run: larod-client -p -c cpu-tflite -g <model_path> -i <input_file> -o ./output And check if the bytes that you get from larod are the same that you get by testing the same image.bin on your computer. Bonus: Run journalctl -u larod to spot extra error

Let us know how this goes

hulkds commented 1 year ago

Thank you so much for your suggestions!

I updated my firmware and I also tried your second suggestion. Here is my code to write the content of the input address into image.bin file:

while (true) {
  ...
  FILE *fptr;

  if ((fptr = fopen("/tmp/image.bin","wb")) == NULL){
      printf("Error! opening file");

      // Program exits if the file pointer returns NULL.
      exit(1);
  }

  fwrite(larodInputAddr, sizeof(uint8_t), args.width * args.height * CHANNELS, fptr);
  ...
}

When I converted .bin to rgb image using this binary-to-image, I got following results:

I think I did it wrong because I found my frame rotated, a normal frame should be like this: c

Please let me know if you have any ideas?

Corallo commented 1 year ago

Not sure what is going on here. Looks that either your input data gets distorted before arriving to the larodInputAddr variable, or your display function as a problem. Just to be extra sure, try to display your image with this:

import numpy as np
import cv2

W, H = 300, 300

#load bin file into numpy array
def read_array_from_bin(filename):
    return np.fromfile(filename, dtype=np.uint8)

input = read_array_from_bin("your_image.bin")
input=input.reshape((3,H,W))
#input=np.transpose(input, (1, 2, 0)) uncomment if image is RGB plannar 
#convert rgb to bgr
input = cv2.cvtColor(input, cv2.COLOR_RGB2BGR)

# display image with cv2
cv2.imshow("input", input)
cv2.waitKey(0)

If you still see the image deformed, try the same on the image as soon as you get it from the video stream, before any transformation.

hulkds commented 1 year ago

I tried your code but it doesn't work. It gives an error ValueError: cannot reshape array of size 28672 into shape (100,100,3).

On the other hand, I tried to save the image directly to a .jpg file and it gave good images (I modified from this code):

unsigned long jpeg_size = 0;
unsigned char* jpeg_buffer = NULL;
struct jpeg_compress_struct jpeg_conf;
set_jpeg_configuration(args.width, args.height, CHANNELS, args.quality, &jpeg_conf);
buffer_to_jpeg(larodInputAddr, &jpeg_conf, &jpeg_size, &jpeg_buffer);
jpeg_to_file("/tmp/image.jpg", jpeg_buffer, jpeg_size);
free(jpeg_buffer);

Accordingly, I think there is no problem with my input image.

By the way, you can find my images .bin here for testing.

Thank you very much !

Corallo commented 1 year ago

There is something odd if your saved files are 28672 bytes. If your image is 100x100x3 you should be saving exactly 30'000 bytes. You need to get a proper image saved as .bin to have a reproducible experiment that we eventually verify

Besides, the larod input address should receive an image that is the size of your input model. is 100x100x3 your model input?

hulkds commented 1 year ago

Yes, I agree with you that my .bin should be 30000 bytes but I can't get this with the code above. Can you please confirm that there is no mistake in my code here ?

And yes, my model input is 100x100x3 (as it work on my computer).

Corallo commented 1 year ago

Yes, it looks correct to me. Have you tried more than once? Maybe you scp it to your machine before it was done writing? Maybe you can add an fclose after the fwrite to make sure everything gets flushed to the file. Or try to pad your bin file with zeros and see if the rest of the file is a meaningful image.

hulkds commented 1 year ago

Yes, you are right! I forgot fclose(). Thank you very much !

By the way, for your third suggestion about testing on my computer, do you mean I have to convert the .bin image to RGB image and test it on my computer or is there some way that I can test directly with .bin image?

Corallo commented 1 year ago

I meant using directly the .bin file as input. e.g. input = np.fromfile(filename, dtype=np.uint8)

hulkds commented 1 year ago

When I run this command on the camera larod-client -p -c cpu-tflite -g <model_path> -i <input_file> -o ./output, it give me ME"B as the output. I don't know what that means. Do you have any ideas ?

Corallo commented 1 year ago

You are probably interpreting the output bytes as char. Load the output file in a numpy array.