"Payload data has been discarded" when using tensorflow lite in a separate thread.

robertatdm commented 9 months ago

Describe what you want to implement and what the issue & the steps to reproduce it are:

This is hard to reproduce because I used a custom embedded board with an i.MX8M+ and a custom Linux (v6.1.24). A little advice for further debugging would be helpful.

I am trying to run a python application with two threads. The camera thread continuously grabs new images and prints out the mean value of the image pixels (just for testing).

class CameraThread(threading.Thread):
    def __init__(self):
        super().__init__()
        self.camera = None
        self.frame = None
        self.w_aq = 2160    # Width of aquisited image from camera.
        self.h_aq = 1620
        self.running = False

    def connect_camera(self):
        self.camera = pylon.InstantCamera(pylon.TlFactory.GetInstance().CreateFirstDevice())
        if not self.camera:
            print("Could not connect to camera")

    def set_camera_parameters(self):
        self.camera.Open()
        ...
        self.camera.AcquisitionFrameRateEnable.SetValue(True)
        self.camera.AcquisitionFrameRate.SetValue(20.0)
        self.camera.DeviceLinkThroughputLimitMode.SetValue("On")
        self.camera.DeviceLinkThroughputLimit.SetValue(200000000)
        ...

    def run(self):
        self.running = True
        self.connect_camera()
        self.set_camera_parameters()

        self.frame = np.full((self.camera.Height.Value, self.camera.Width.Value, 3), 123, np.uint8)

        self.camera.StartGrabbing(pylon.GrabStrategy_LatestImageOnly)
        while self.running:
            try:
                res = self.camera.RetrieveResult(5000, pylon.TimeoutHandling_Return)
                if(res.GrabSucceeded()):
                    im = res.GetArray().copy()
                    print(int(im.mean()), end=' ')
                else:
                    print("WARNING: Grab didn't succeed")
                    print(res.GetErrorDescription())
                res.Release()
            except Exception as e:
                print(e)
                self.running = False

        if self.camera:
            self.camera.StopGrabbing()
            self.camera.Close()

    def stop(self):
        self.running = False

Then there is another thread which repeatedly runs a tensorflow lite model:

class InferenceThread(threading.Thread):
    def __init__(self):
        super().__init__()      

        self.running = False

        self.interpreter = tfl.Interpreter(
            "retinanet/model_int8.tflite",  
            experimental_delegates=[tfl.load_delegate("/usr/lib/libvx_delegate.so")]  
        )

        self.input_details = self.interpreter.get_input_details()   
        self.output_details = self.interpreter.get_output_details()
        self.interpreter.allocate_tensors()
        self.images_shape = self.input_details[0]['shape']

        # First time invoke (Takes much time)
        dummy_data = np.random.uniform(0.0, 255.0, self.images_shape).astype('uint8')
        self.interpreter.set_tensor(self.input_details[0]['index'], dummy_data.copy())
        self.interpreter.invoke()

    def process_image(self):
        dummy_data = np.random.uniform(0.0, 255.0, self.images_shape).astype('uint8')
        self.interpreter.set_tensor(self.input_details[0]['index'], dummy_data.copy())
        self.interpreter.invoke()

    def run(self):
        self.running = True
        while self.running:
            time.sleep(0.8)
            self.process_image()

    def stop(self):
        self.running = False

Here is the rest of the code for completeness:

import signal, threading, time
import numpy as np
from pypylon import pylon
import tflite_runtime.interpreter as tfl

if __name__ == "__main__":
    signal.signal(signal.SIGINT, signal.SIG_DFL)
    camera_thread = CameraThread()
    inference_thread = InferenceThread()

    camera_thread.start()
    inference_thread.start()

    camera_thread.join()
    inference_thread.join()

The grabbing works for a few images, but occasionally I get an unsuccessful GrabResult with this ErrorDescription: Payload data has been discarded. Payload data can be discarded by the camera device if the available bandwidth is insufficient. I can use other computationally expensive threads, but the issue arises only when I use tensorflow lite with libvx_delegate, that uses the built-in NPU of the i.MX8M+. I have no Idea, how those two unrelated libraries can interfer with each other.

The test works fine on other systems (NXP i.MX8M+ evaluation board / Ubuntu Host-PC).

It would be great to know more about this error and what can potentially cause it. Thanks in advance.

Is your camera operational in Basler pylon viewer on your platform

Yes

Hardware setup & camera model(s) used

Camera: Basler dart da3840-45uc

Connected via a single USB-micro-B cable

System: Unfortunately this is all very customized and can have lot's of error causes.

A custom Single Board Computer with an i.MX8MPlus and USB-3.0 interface for the camera.
A custom Board Support Package merged with the meta-freescale and meta-imx yocto layers.
- Linux kernel (v6.1.24)

Runtime information:

python: 3.10.9 (main, Dec  6 2022, 18:44:57) [GCC 11.3.0]
platform: linux/aarch64/6.1.24
pypylon: 1.9.0+pylon6.2.0 / 6.2.0.18677

robertatdm commented 9 months ago

I reduced the framerate to 10 fps and the rate of unsuccessful GrabResults is increasing. Now we also have unsuccessful results with an empty error description and ones with the following description: "The current block ID must be larger than the previous block ID."

Edit: I also found this ErrorDescription in my output: "Read operation failed."

thiesmoeller commented 9 months ago

Retest with increased transfer size. This will reduce the kernel overhead for the receive process. Set before Start streaming:

cam.StreamGrabber.MaxTransferSize = 4 * 1024 * 1024

Depending on your kernel setup you might have to increase usbfs memory

For a runtime change you could use:

sh -c 'echo 1000 > /sys/module/usbcore/parameters/usbfs_memory_mb'

robertatdm commented 9 months ago

Thank you for your quick answer. Unfortunately, the error persists. Every few images, I still get unsuccessful grabResults.

WARNING: Grab didn't succeed
Payload data has been discarded. Payload data can be discarded by the camera device if the available bandwidth is insufficient.
WARNING: Grab didn't succeed
The current block ID must be larger than the previous block ID.
...
...
...
WARNING: Grab didn't succeed

WARNING: Grab didn't succeed
Read operation failed.
WARNING: Grab didn't succeed
Payload data has been discarded. Payload data can be discarded by the camera device if the available bandwidth is insufficient.

I also tried other grab strategies (Background Loop), but no success. If you have another idea, please let me know. Otherwise, I think this issue is more related to our Board Support Package and not pylon.

P.S. I can run CPU-intensive threads and it works fine, I can run tensorflow lite without NPU support and it works fine. It only doesn't work with the libvx_delegate.so.

thiesmoeller commented 9 months ago

comment from our side: As both USB host controller and the (NPU / GPU ) libvc_delegate .. needs sufficient memory bandwidth to operate, you could do a quick check with perf stat on the DDR read/write performance counters when running NPU alone and when running USB camera acquisition alone ... the sum should be a a lot smaller than the max DDR performance.

robertatdm commented 9 months ago

I don't think, it is related to memory bandwidth. The test works fine on an NXP I.MX8M+ evaluation board (No unsuccessful GrabResults) . Same SoC, same test, but different Module and BSP.

When using a smaller model and a slower acquisition frame rate, the bandwidth-related error disappears, but the other errors remain (error codes: 3792764924 and 3791650833 (dec)). Do you know anything else that can potentially cause those errors?

robertatdm commented 9 months ago

The issue seems to be gone for now. Apparently, after the first time calling self.interpreter.invoke(), we should retrieve the output tensors using interpreter.get_tensor(). Otherwise, something isn't cleaned up or released properly under the hood. I didn't trace the issue any further. @thiesmoeller Thank you for your quick responses.

robertatdm commented 8 months ago

Unfortunately, I have to reopen this issue. It only disappeared temporarily and came back right the week after. Since then I have tried to play with different kernel versions, kernel providers (nxp, freescale community bsp), gpu driver versions. No success. Instead of Pylon I cross compiled Aravis and got some more information. Aravis uses libusb to talk to the camera. As soon as we execute the NPU process, we first receive a LIBUSB_TRANSFER_STALL and then LIBUSB_TRANSFER_ERROR messages.

As already said, both processes are lightweight in terms of memory bandwidth, memory capacity and cpu load.

I would like to know, why a stall can occur and if it is a camera issue or a host issue.

Thanks in advance.

robertatdm commented 8 months ago

Here is the output of usbmon when the error occured:

ffff0000d7dc9f00 2364447587 S Bi:2:004:1 -115 874800 <
ffff0000d7dc9d00 2364448205 S Bi:2:004:1 -115 1024 <
ffff0000d7dc9700 2364448241 S Bi:2:004:1 -115 1024 <
ffff0000d6ae5e00 2364485779 C Bi:2:004:1 0 52 = 5533564c 00003400 2f000000 00000000 00000100 38bbcc31 59000000 09000801
ffff0000d66d1600 2364496906 C Bi:2:004:1 0 874800 = 020b0006 0a000408 00000004 0a0f0216 00110414 02180011 051d0008 0009121e
ffff0000d0c59a00 2364496927 C Bi:2:004:1 0 32 = 55335654 00002000 2f000000 00000000 00000000 30590d00 00000000 38040000
ffff0000d63ab200 2364497616 S Bi:2:004:1 -115 874800 <
ffff0000d63ab400 2364498188 S Bi:2:004:1 -115 1024 <
ffff0000d0383b00 2364498225 S Bi:2:004:1 -115 1024 <
ffff0000d6439c00 2364535781 C Bi:2:004:1 0 52 = 5533564c 00003400 30000000 00000000 00000100 b8bac734 59000000 09000801
ffff0000d6a2b200 2364546942 C Bi:2:004:1 0 874800 = 06000b0e 00030814 00090000 0012140f 00000000 16160000 020f0008 06150513
ffff0000d6a2b800 2364546966 C Bi:2:004:1 0 32 = 55335654 00002000 30000000 00000000 00000000 30590d00 00000000 38040000
ffff0000d6ae5e00 2364547687 S Bi:2:004:1 -115 874800 <
ffff0000d6ae5200 2364548353 S Bi:2:004:1 -115 1024 <
ffff0000d7de9a00 2364548387 S Bi:2:004:1 -115 1024 <
ffff0000d6a2ba00 2364585799 C Bi:2:004:1 0 52 = 5533564c 00003400 31000000 00000000 00000100 38bac237 59000000 09000801
ffff0000d6a2b600 2364591014 C Bi:2:004:1 -32 344064 = 030c000e 0c000001 00011a15 08160c0b 0c160b04 091d0c17 011b0015 000e040f
ffff0000d6a2b300 2364591094 C Bi:2:004:1 -71 0
ffff0000d6a2bb00 2364591098 C Bi:2:004:1 -71 0
ffff0000d7de9b00 2364591824 S Bi:2:004:1 -115 874800 <
ffff0000d6a2bf00 2364592189 C Bi:2:004:1 -71 0
ffff0000d6a2be00 2364592547 C Bi:2:004:1 -71 0
ffff0000d6a2bd00 2364592551 C Bi:2:004:1 -71 0
ffff0000d7fcbe00 2364592604 S Bi:2:004:1 -115 1024 <
ffff0000d7fcb600 2364592631 S Bi:2:004:1 -115 1024 <
ffff0000da3bf200 2364592947 S Bi:2:004:1 -115 874800 <
ffff0000d6a2b700 2364593491 C Bi:2:004:1 -71 0
ffff0000d11ab900 2364593645 C Bi:2:004:1 -71 0
ffff0000d632b500 2364593663 S Bi:2:004:1 -115 1024 <
ffff0000d632b400 2364593689 S Bi:2:004:1 -115 1024 <
ffff0000da257000 2364593695 C Bi:2:004:1 -71 0
ffff0000d632b800 2364593969 S Bi:2:004:1 -115 874800 <
ffff0000da257200 2364594619 C Bi:2:004:1 -71 0
ffff0000da257100 2364594729 C Bi:2:004:1 -71 0
ffff0000da257300 2364594733 C Bi:2:004:1 -71 0

I used Aravis for setting up a stream of images of size 1080x810 pixels (See the requests with payload size 874800). Everything seems to work until suddenly we receive an URB callback event with -32 (EPIPE). After that the stream is broken as we only get -71 (EPROTO) back.

thiesmoeller commented 8 months ago

Are you in contact with NXP?

robertatdm commented 7 months ago

Yes, see here. The error occurs also with the pylon gst-plugin. I believe, it is something kernel- related. Not sure, if we should keep this issue open or not. With our custom hardware and BSP it is hard to reproduce the error by anyone else.

basler / pypylon