IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.44k stars 4.8k forks source link

on Android USB 3.2, the example `capture` constantly report "bulk_transfer returned error, error: Out of memory" #4091

Closed Mad-Thanos closed 4 years ago

Mad-Thanos commented 5 years ago
Required Info
Camera Model D415
Firmware Version 5.11.04
Operating System & Version MacOS 10.13.6+Android Studio 3.4.1
Kernel Version (Linux Only) N/A
Platform Andriod 8.1(linux kernel 4.5) +USB 3.1 Gen2 c-to-c cable
SDK Version 2.22.0
Language java &c++ }
Segment Smartphone

Issue Description

short version:

I ran the wrappers/android/examples/capture app on a Android Phone(has a usb 3.1 compatible type-c female interface) with D415 plugged in. But there was only black screen on the UI, which means the text "Connect to a Realsense Camera" was gone but no video data rendered. And at the same time, I found in the logcat of Android studio there reported constantly a plenty of warning messages:

2019-05-29 00:01:49.276 27981-28475/com.intel.realsense.capture W/librs: bulk_transfer returned error, endpoint: �, error: Out of memory

the full log was something as follow:

Click me to expand ``` 2019-05-29 00:01:45.036 27981-28098/com.intel.realsense.capture D/librs DeviceWatcher: Device: /dev/bus/usb/003/002 added successfully 2019-05-29 00:01:49.237 27981-27981/com.intel.realsense.capture I/librs: Found UVC Device vid: id- vid- 8086 pid- ad3 mi- 0 unique_id- /dev/bus/usb/003/002 path- /dev/bus/usb/003/002 susb specification- 320 2019-05-29 00:01:49.237 27981-27981/com.intel.realsense.capture I/librs: Found UVC Device vid: id- vid- 8086 pid- ad3 mi- 3 unique_id- /dev/bus/usb/003/002 path- /dev/bus/usb/003/002 susb specification- 320 2019-05-29 00:01:49.239 27981-27981/com.intel.realsense.capture I/librs: Found UVC Device vid: id- vid- 8086 pid- ad3 mi- 0 unique_id- /dev/bus/usb/003/002 path- /dev/bus/usb/003/002 susb specification- 320 2019-05-29 00:01:49.239 27981-27981/com.intel.realsense.capture I/librs: Found UVC Device vid: id- vid- 8086 pid- ad3 mi- 3 unique_id- /dev/bus/usb/003/002 path- /dev/bus/usb/003/002 susb specification- 320 2019-05-29 00:01:49.241 27981-27981/com.intel.realsense.capture D/librs capture example: try start streaming 2019-05-29 00:01:49.241 27981-27981/com.intel.realsense.capture I/librs: Found UVC Device vid: id- vid- 8086 pid- ad3 mi- 0 unique_id- /dev/bus/usb/003/002 path- /dev/bus/usb/003/002 susb specification- 320 2019-05-29 00:01:49.241 27981-27981/com.intel.realsense.capture I/librs: Found UVC Device vid: id- vid- 8086 pid- ad3 mi- 3 unique_id- /dev/bus/usb/003/002 path- /dev/bus/usb/003/002 susb specification- 320 2019-05-29 00:01:49.276 27981-28475/com.intel.realsense.capture W/librs: bulk_transfer returned error, endpoint: �, error: Out of memory ...... many many repeated warning message as the above line..... ```

My Question is: Has anybody ever seen this problem before? What should I do to fix this?

Longer version:

My setup:

"Android Phone Huawei P20 + USB 3.1 Gen 2 C Male to C Male cable + Realsense D415". I've smiply checked the cable, it is compliant with USB 3.1 Gen 2 10/20Gbps 5A 100W. I've also checked the Huawei P20's type c interface by plugging it into my Macbook and checking the 'USB 3.0 Bus' section of System Report of the MacBook. It showed the Speed is up to 5Gbps of "Huawei P20+ the c-to-c cable" combination. I guess that means either Huawei P20 or my Macbook has a USB 3.1 Gen 1 or USB 3.0 interface.

What I did and what I saw:

I downloaded the librealsense-2.22.0.aar and librealsense-2.22.0.zip from dl.bintray.com. After that, I import them into the android project of wrappers/android, making both examples/native-example and examples/capture Modules to depend on them.

Then I modified the examples/native-example's source code to enumerate the device, the sensors and the stream profiles of each sensor. And save the enumerated result to a log file: usb3.2-P20-enumerate-log.txt In a word, I could tell that the highest profiles (1280x720xZ16x30Hz Depth stream and the 1920x1080xRGB8x30Hz Color stream) are available.

In the next step, I build and ran examples/capture on the Huawei P20. After granting the USB permission the capture app, the screen was black and the message “27981-28475/com.intel.realsense.capture W/librs: bulk_transfer returned error, endpoint: �, error: Out of memory” was flooding the Logcat.

kafan1986 commented 4 years ago

@GucciPrada As your connection type is USB 2.1, only 6fps is supported for 1280x720. If you want higher fps such as 15fps or 30fps, you need switch to USB3 connection.

@kafan1986 What's your android device when you get the issue with latest v2.30? And could you please provice the full logcat log when the issue happened? Thanks!

My device is Odroid N2 running android 9 and I am using SDK v 2.30. I have pasted logs on separate issue thread https://github.com/IntelRealSense/librealsense/issues/5328

kafan1986 commented 4 years ago

@matkatz @RealSenseCustomerSupport @dorodnic @RealSense-Customer-Engineering I have tested and tried many things on android platform and none of them helped with the Out of memory error. I am using Odroid N2 SBC running android 9 through USB 3.0. At 15 or even 6 fps I get OOM error after few minutes and the device stalls. Depth set at 1280x720 and colour at 1920x1080.

I have written custom android code to recover the device using hardware_reset API provided by librealsense SDK but it keeps on trying to recover device and I have also tried it with and without Odroid's solution to reset the USB hub but without much success as even after reboot most times at least one the stream is stuck. The code for this class is as below.

Streamer.java.txt

Following this https://github.com/microsoft/Azure-Kinect-Sensor-SDK/issues/485 , https://stackoverflow.com/questions/23005708/kernel-crashes-due-to-oom-error-usb-submit-urb , https://discuss.aerospike.com/t/how-to-tune-the-linux-kernel-for-memory-performance/4195 I increased usbfs_memory_mb value.

/sys/module/usbcore/parameters/usbfs_memory_mb 128        #Increased from default 16mb 

/proc/sys/vm/min_free_kbytes 40000                                        #Changed to 1% percent of my device 4GB RAM

Following https://www.bo-yang.net/2015/03/30/debug-kernel-space-memory-leak I looked for any memory leaks. The plots and logs are attached below.

mem_plot_2 mem_plot_1 monitor_output_localhost.txt

Part of dmesg from android during error is attached below. android_dmesg.txt

The logcat messages generated by librs is attached below. logcat.txt

I have asked this forum multiple times but haven't heard anything promising. Given the issue was first reported almost 6-7 months back and has not been resolved yet, I hope someone soon takes a look at this issue and gets this resolved.

matkatz commented 4 years ago

@alowenst01 please take a look.

kafan1986 commented 4 years ago

@alowenst01 @matkatz Was isochronous data transfer ever tried for Android? I found a repo where someone implemented the same for Android. https://github.com/Peter-St/Android-UVC-Camera

kafan1986 commented 4 years ago

UPDATE I am running streaming depth and colour frame on android for last 36+ hours at 720p 15fps without any crash. I do get some messages like below every few minutes but it has yet to give any additional critical error like OOM.

W/librs: control_transfer returned error, index: 768, error: Broken pipe, number: 32

Basically currently my code is just getting the depth and colour frame with just some post processing filter and nothing else. Top command shows the CPU usage to be around 40-45%. Earlier I was also drawing the eachb colour frame to UI and usually the app would crash in minutes.

It means although the data transfer is the culprit and the memory leak is not in the SDK itself but in the way how the RSUSB is implemented. When the CPU is not stressed then RSUSB can work fine but when it is stressed the USB request fail and then I believe the method of cancelling USB request, that should cancel the filled URB queues held with the linux kernel is not working properly releasing the queue and thus causes Out of memory error.

As expected in such case hardware reset or even custom USB controller reset is not able to recover and only option is to reboot the entire system.

RealSenseCustomerSupport commented 4 years ago

@Kafan1986 Glad to see 720p 15fps work for you. We did see some stability issue with HD or FHD resolution due to Android platform limitation. So please use the working configuration as workaround at this point. Thanks!

kafan1986 commented 4 years ago

@kafan1986 Glad to see 720p 15fps work for you. We did see some stability issue with HD or FHD resolution due to Android platform limitation. So please use the working configuration as workaround at this point. Thanks!

Actually I posted a little earlier. Although the camera was working but the colour frame was stuck, which I did not know at that point of time. Currently at 720p the device works from 4-7 hrs before it eventually faces the colour frame stuck or out of memory error. Happens in 1 - 1.5 hrs when set up at 1080p.

RealSenseCustomerSupport commented 4 years ago

@kafan1986 So previously what you said "I am running streaming depth and colour frame on android for last 36+ hours at 720p 15fps without any crash" is not always what you get, right? What format did you set for color frame? Could you please try YUYV format to see if any improvement?

kafan1986 commented 4 years ago

@kafan1986 So previously what you said "I am running streaming depth and colour frame on android for last 36+ hours at 720p 15fps without any crash" is not always what you get, right? What format did you set for color frame? Could you please try YUYV format to see if any improvement?

@RealSenseCustomerSupport What I mean is that when I made that comment, my test setup was incorrect. I was working on the depth data only. When one streams color frame too then the system will crash quickly on Android. This either happens silently, i.e. the color stream gets stuck and then one needs to either check visually or check by comparing frame id with previous frame id to know this programmatically; otherwise the it throws out of memory error and it leads to frame timeout.

At 720p colour frame stream this eventually happens every 5-7 hours and at 1080p it happens within 1-2 hours. In all above cases, the only confirmed way to recover is to reboot the entire system. "Resetting" device through official API or even resetting USB controller on the system does not guarantee recovery.

Also, in the meanwhile can you update the status of the team's internal status on issue: (DSO-13539) - [Android] Camera disconnected after streaming some duration with Android Camera Sample

rafaelspring commented 4 years ago

@matkatz @RealSenseCustomerSupport @RealSense-Customer-Engineering @dorodnic

Sorry for the long silence. I changed my GitHub name (used to be @xtrawurst).

We have tried the latest 2.32.1 librealsense and we still get frequent loss of connection on the Samsung Galaxy Tab S4 running Android 9.

Interestingly we sometimes get a more stable connection using the "RS Camera" Android app instead of our capturing app, even if we use the same configuration (resolutions and frame rate).

Would it be possible to open-source the RS Camera app code so the community can see how it performs configuration etc?

Update: This seems to be a cable issue! Using a proper Thunderbolt 3 cable instead of anything labeled "USB" (even USB 3.2 compliant cables) makes all the difference. I couldn't reproduce any of the above problems when using a Thunderbolt 3 cable. @kafan1986 Please let us know if you are seeing the same.

Update 2: Actually it seems to be more subtle than that. While I haven't found a Thunderbolt cable that doesn't work, I have also found a USB 3.1 Gen2 compliant C-to-C cable that does work: https://www.amazon.com/AmazonBasics-Double-Braided-Nylon-Type-C/dp/B07D7RZ1VS

kafan1986 commented 4 years ago

@matkatz @RealSenseCustomerSupport @RealSense-Customer-Engineering @dorodnic

Sorry for the long silence. I changed my GitHub name (used to be @xtrawurst).

We have tried the latest 2.32.1 librealsense and we still get frequent loss of connection on the Samsung Galaxy Tab S4 running Android 9.

Interestingly we sometimes get a more stable connection using the "RS Camera" Android app instead of our capturing app, even if we use the same configuration (resolutions and frame rate).

Would it be possible to open-source the RS Camera app code so the community can see how it performs configuration etc?

Update: This seems to be a cable issue! Using a proper Thunderbolt 3 cable instead of anything labeled "USB" (even USB 3.2 compliant cables) makes all the difference. I couldn't reproduce any of the above problems when using a Thunderbolt 3 cable. @kafan1986 Please let us know if you are seeing the same.

Update 2: Actually it seems to be more subtle than that. While I haven't found a Thunderbolt cable that doesn't work, I have also found a USB 3.1 Gen2 compliant C-to-C cable that does work: https://www.amazon.com/AmazonBasics-Double-Braided-Nylon-Type-C/dp/B07D7RZ1VS

The RS Camera app code is already present in the github. The camera app that builds the SDK is the official RS camera app code.

I have tested two approach to create a pipeline of depth and RGB data. A) Start a high priority thread and loop to get the frameset. B) Using a Looper/Handler approach for frameset callback.

Although there should not be any difference in either approach A or B but after months of testing. I can say approach A is OK, only if you are not doing much on the android platform. Once the CPU cores are stressed even a little, maybe due to some parallel processing, it crashes with all sort of USB data transfer issues.

The 2nd approach (B) is the one used by the RS camera app. With this I can run my system somewhat stable at (720p colour + depth) at 15 fps stable for 7-8 hours before the crash and I need to reboot the system.

I can not use thunderbolt cable as I am using the depth camera with single board computer (SBC) running android and it only has USB type A ports. Also, the thunderbolt cable will probably increase my setup cost given my future use case. I have used the below cable (10 gbps) for my use case and another USB 3.0 (5 gbps) cable and both of them provided same stability and none resolved issue in entirety. https://www.amazon.in/gp/product/B016RNC8AS/ref=ppx_yo_dt_b_asin_title_o03_s00?ie=UTF8&psc=1

RealSenseCustomerSupport commented 4 years ago

@GucciPrada Any questions about this issue?

@kafan1986 Did you try color using YUYV format to see if any improvement? Thanks!

Mad-Thanos commented 4 years ago

Due to the problem and possible solutions mentioned in this issue, I finally decided turn to Linux, give up trying android.

kafan1986 commented 4 years ago

@GucciPrada Any questions about this issue?

@kafan1986 Did you try color using YUYV format to see if any improvement? Thanks!

I am using YUYV and the error is still there. At lower frame rate i.e. 15 FPS and 720p color + 720p depth. The error occurs after every 6-7 hours and then I need to restart the entire system.

RealSenseCustomerSupport commented 4 years ago

@kafan1986 Did you try to lower the resolution such as 640x480 to see if any improvement? Thanks!

kafan1986 commented 4 years ago

@kafan1986 Did you try to lower the resolution such as 640x480 to see if any improvement? Thanks!

I am already using lower resolution of 720p rather than 1080p. Any lower resolution will impact my other deep learning models.

RealSenseCustomerSupport commented 4 years ago

@GucciPrada Sorry for the inconvenience on android platform. If lower resolution on android can't be accepted for you, then switch to Linux will be good choice.

@kafan1986 Sorry for the inconvenience. For the issues on android platform, our engineering team did investigations including the analysis of USB trace, however we didn't find clear clue to resolve it from librealsense. It might need the investigation from android host which is out of our scope. Sorry for that.

RealSenseCustomerSupport commented 4 years ago

@GucciPrada Any other questions about this? Thanks!

Mad-Thanos commented 4 years ago

Nope, I close this issue for now.