AxisCommunications / acap-computer-vision-sdk-examples

Example applications that provide developers with the tools and knowledge to use Axis Camera Application Platform (ACAP) Computer Vision solution
Apache License 2.0
52 stars 22 forks source link

Accuracy difference with tflite quant8 model #189

Closed akash4562800 closed 3 months ago

akash4562800 commented 3 months ago

Please do not disclose security vulnerabilities as issues. See our security policy for responsible disclosures.

Before opening an issue

Issue Checklist

Describe the bug

I am using "AXIS M4317-PLR Panoramic Camera" for detection of an object detection model. I trained a mbilenet ssd model, exported to tflite quant8 and able to successfully infer on the Axis Camera" For inference on axis camera, I'm using: https://github.com/AxisCommunications/acap-computer-vision-sdk-examples/tree/main/object-detector-python.

Issue: Issue is I'm seeing the difference in detection accuracy when I'm running the same model on camera and my local. I understand I'm using different methods and I want to understand how to resolve this difference in accuracy. Local Inference steps:

self.inference_client = tf.lite.Interpreter(model_path=model_path)
self.inference_client.allocate_tensors()
self.input_details = self.inference_client.get_input_details()
self.output_details = self.inference_client.get_output_details()
def pre_process(self, image):
        image = cv2.resize(image, (320,320),interpolation=cv2.INTER_AREA)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = np.expand_dims(image, axis=0)
        image = image.astype(np.uint8)

        return image
resized_image = self.pre_process(image)
self.inference_client.set_tensor(self.input_details[0]['index'], resized_image)
self.inference_client.invoke()

result = {}
for i, detail in enumerate(self.output_details):
result[detail['name']] = self.inference_client.get_tensor(detail['index'])
bounding_boxes, classes, confidences = tuple([np.squeeze(result[key]) for key in [
'StatefulPartitionedCall:3', 'StatefulPartitionedCall:2', 'StatefulPartitionedCall:1']])

for Inference on axis camera, I'm using https://github.com/AxisCommunications/acap-computer-vision-sdk-examples/blob/main/object-detector-python/app/detector.py and https://github.com/AxisCommunications/acap-computer-vision-sdk/blob/main/sdk/tfserving/tf_proto_utils.py (not entirely sure) InferenceClient.infer method.

def detect(self, image):
        image = np.expand_dims(image, axis=0)
        image = image.astype(np.uint8)
        success, result = self.inference_client.infer({'data': image}, self.model_path)
        if not success:
            return False, 0, 0, 0
        bounding_boxes, classes, confidences = tuple([np.squeeze(result[key]) for key in [
            'StatefulPartitionedCall:3', 'StatefulPartitionedCall:2', 'StatefulPartitionedCall:1']])
def run_camera_source(self):
        stream_width, stream_height, stream_framerate = (800, 800, 10)
        capture_client = VideoCaptureClient(socket=self.grpc_socket,
                                            stream_width=stream_width,
                                            stream_height=stream_height,
                                            stream_framerate=stream_framerate)
         while True:
            frame = capture_client.get_frame()
            if frame is None:
                print("Error: Failed to capture frame")
                time.sleep(1)
                continue
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frame_resized = cv2.resize(frame,(320,320),cv2.INTER_AREA )
            succeed, bounding_boxes, obj_classes, confidence = self.detect(frame_resized)

Axis camera is using tfserve and to serve the model and from this I'm guessing the issue might come. I have tried to mimic the pre-processing steps from detector.py but issue still persists. My local inference is giving higher accuracy. Please help us.

To reproduce

Please provide as much context as possible and describe the reproduction steps that someone else can follow to recreate the issue on their own.

A team member will try to reproduce the issue with your provided steps. If there are no reproduction steps or no obvious way to reproduce the issue, the team will ask you for those steps. Bugs without steps will not be addressed until they can be reproduced.

Steps to reproduce the behavior:

  1. Set up '...'
  2. Do this '...'
  3. See error

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

Additional context

Add any other context about the problem here.

Corallo commented 3 months ago

Hi @akash4562800

From your code snippet I see you are converting BGR to RGB, unless I misunderstood your intent that doesn't seem correct. The VideoCaptureClient will return an RGB image. Maybe that is the reason of the performance difference? If you want to dig more, take a look at this discussion: https://github.com/AxisCommunications/axis-model-zoo/discussions/50 It has some guideline on how to debug another model, but you might find it useful.

In particular, you can try it on a fixed image both on tensorflow and on the camera. You can save the image as a binary file and use the larod-client to test the model in isolation. Here is a guide that shows how to do it: https://developer.axis.com/computer-vision/computer-vision-on-device/test-your-model

akash4562800 commented 3 months ago

Hi @Corallo , Thanks for your prompt response. I checked the VideoCaptureClient code and found it's indeed returning RGB image. The frame upon looking seemed BGR that's why I used cv2.cvtColor() method. Although logically it's doesn't make sense but this was the issue and now I'm getting same results from both local and axis camera. Thanks a lot.

I have another question, Which tracker will be the best for tracking person that can run on this camera. I am using centriod based kf tracker but I see tracks are failing many times. You can mark this complete and move to discussion section.