KuriRobot / Kuri-Documentation

Documentation for Kuri the adorable home robot.
MIT License
17 stars 13 forks source link

h264 compression #34

Closed amalnanavati closed 4 years ago

amalnanavati commented 4 years ago

I am using madmux's H264 compression (channels /var/run/madmux/ch1.sock and /var/run/madmux/ch2.sock), and using libav to decode it (broadly following this example).

Upon some digging, I found that Channel 1 behaves as expected, with mostly P-frames but periodic I-frames. However, on Channel 2, all of the frames are encoded as I-frames. Of course, this creates unnecessary latency when sending the video stream over the network.

My ultimate goal is to drive Kuri's camera latency over the network as close to ~0.2 seconds as possible (right now we are getting close to ~0.5 seconds). Therefore, I have a few questions:

  1. Do you know why all the frames on Channel 2 are I-frames? (Note that as mentioned in Issue #33 , my libmadmux does not contain functions to modify the resolution, so lowering the resolution on Channel 1 is not an option).
  2. Could you provide more details about how madmux-daemon encodes its video stream into the H264 format? If the code is (or can be made) open-source, can you share it?
  3. Could you provide details about what video encoding/decoding format you used for the teleoperation interface on Kuri's app?
  4. Could you provide intuition about how much latency you have seen when using Kuri's camera, and what you think the best achievable latency might be (on a fast network, and a powerful remote computer where rendering is not a bottleneck)?
po1 commented 4 years ago

Hi @amalnanavati, let me try to answer your questions, maybe out of order:

  1. Could you provide more details about how madmux-daemon encodes its video stream into the H264 format? If the code is (or can be made) open-source, can you share it?

An important point is that madmux is just a multiplexer. It does not decode/encode video. The video comes encoded in hardware from the camera module itself. The camera (ab)uses the V4L2 interface to expose more than one channel. Madmux is a wrapper around a proprietary vendor library that exposes these video channels programmatically. Additionally, madmux makes these channels available through a socket interface (hence the server-client split), which allows multiple clients to access them simultaneously.

Although the H264 encoder is proprietary and done in the camera itself, its configuration can be altered by the running system through a camera datapath file. That file is located in /opt/geocam/config.json, I suggest you give that a look :) I unfortunately do not have any documentation for that file beyond what is obvious from its internal structure.

  1. Could you provide details about what video encoding/decoding format you used for the teleoperation interface on Kuri's app?

The teleop part was done using channel 2: H264 at 720p.

  1. Could you provide intuition about how much latency you have seen when using Kuri's camera, and what you think the best achievable latency might be (on a fast network, and a powerful remote computer where rendering is not a bottleneck)?

My gut tells me that video latency was never that good. Definitely not lower than 0.2 seconds. That said, this was over a WebRTC pipeline that had many elements. The camera itself should be able to give you a low-latency feed. Madmux does not introduce any significant latency. Below is a quick datapath diagram:


                                            +------------------+           +-------------+
+--------+           +--------------+       |+---------------+ |  -------> |madmux client|
| Camera |   ---->   | Linux kernel | ----->||Proprietary API| |   unix    +-------------+
+--------+    USB    +--------------+  v4l2 |+---------------+ |   socket                 
                                            |                  |           +-------------+
                                            |  madmux daemon   |  -------> |madmux client|
                                            +------------------+   unix    +-------------+
                                                                   socket            

Every step of this diagram is just shoving data around from one buffer to another. All encoding is done on the camera itself. If there is latency at the end of this pipeline, I would expect it to come from the camera itself.

  1. Do you know why all the frames on Channel 2 are I-frames? (Note that as mentioned in Issue #33 , my libmadmux does not contain functions to modify the resolution, so lowering the resolution on Channel 1 is not an option).

I do not know why all frames would be I-frames. I also seem to remember that the camera may or may not support changing resolution. mdx_set_resolution may have been a wishful addition that may never have really been used.

amalnanavati commented 4 years ago

Thanks so much, this is super useful! I'll get back to you in a few days after I've had time to fully go through this and further investigate our latency issues, but this answer provides a lot of indispensable insight into the camera and madmux :)

amalnanavati commented 4 years ago

I've been playing around with this for a few days, and have a few insights that I'll share here for the broader community:

1) mdx_set_resolution does work (after deleting the extra madmux libraries, as documented in #33 ). 2) The camera itself has a latency of several hundred milliseconds. Even when measuring on-board end-to-end latency (using an HDMI cable connected to the Kuri), we got 0.17-0.3s latency (using ch3, MJPEG). Assuming a low rendering time, most/all of this latency is from the camera. 3) H264 compression has a latency of ~0.18s. This was calculated by comparing the on-board end-to-end latency on ch1/ch2 to ch3, accounting for the additional time it took to decode the H264 frames.

Therefore, if someone's goal is to strictly lower latency, the H264 sockets are only useful if they save more than 0.18s on network latency. However, with heavy JPEG compression, we are getting less than 0.18 seconds network latency, and will therefore be using ch3.

I'll post any further insights/data about the camera compression here in case it is useful to the community :)