consider other options for textured depth streaming

bmegli commented 4 years ago

The current textured depth streaming works.

However I expected that bitrate requirements would be lower.

This may be due to hacky unooptimal encoding of infrared in chroma.

Totally subjectively good results require around 8 Mb @ 848x480, 30 fps, no B frames.

Again subjectively I would expect around 5 Mb to be enough from the facts:

4 Mb seemed enough for reasonable quality pointclouds with HEVC Main10 encoding
1 Mb seemed enough for reasonable quality infrared with H.264 encoding

Alternative approach would encode separately depth and infrared.

How it will affect quality and latency is comlex.

From Intel Programmers Reference Manual for KabyLake

VDBOX Media VDBOX:

The encoding process is partitioned across host software, the GPE engine, and the MFX engine. The generation of transport layer, sequence layer, picture layer, and slice header layer must be done in the host software. GP hardware is responsible for compressing from Slice Data Layer down to all macro-block and block layers. Specifically, GPE w/ VME acceleration is for motion vector estimation, motion estimation, and code decision.

HCP HEVC Coding Pipeline:

Supports Video Command Streamer (VCS):

Shared with MFX HW pipeline, and at any one time, only one pipeline (MFX or HCP) and one operation (decoding or encoding) can be active

It seems some of the operations could run concurrently, other not. With that in mind the simplest way to check is through experiment/benchmark.

bmegli commented 4 years ago

Related to #2

bmegli commented 4 years ago

Before proceeding it is important to know how much time it takes to encode to HEVC Main10 currently.

This will be architecture dependent, model dependent and possibly even current CPU/GPU clock in GHz dependent but some idea of the timings is necessary.

bmegli commented 4 years ago

From depth encoding time benchmark

scenario	i7-7820hk	LPA m3-7y30
848x480 depth HEVC Main10	7-9 ms	8-10 ms
640x360 depth HEVC Main10	5-6 ms	6-7 ms
848x480 ir HEVC Main	5-8 ms	6-8 ms
848x480 ir H264	3-5 ms	3-5 ms
640x360 ir H264	3-4 ms	3-4 ms

It seems that at 848x480@30 fps it should be possible to encode depth and ir separately (HEVC + HEVC or HEVC + H264).

In fact there enough time left for some postprocessing (if needed)

bmegli commented 4 years ago

Also interesting:

Benchmarking Open-Source Static 3D Mesh Codecs for Immersive Media Interactive Live Streaming

Point Cloud Compression Tutorial IEEE VCIP XII.2017

bmegli commented 4 years ago

There is Intel whitepaper:

Depth image compression by colorization for Intel® RealSense™ Depth Cameras that describes depth encoding in RGB using hue color space which gives 10 and half bits depth encoding.

Method is interesting but will not work with most of hardware encoders correctly. The reason is chroma subsampling (Intel encoding requires 4:4:4, most hardware encoders in the wild don't support it).

bmegli commented 4 years ago

I am satisfied with how HVS point cloud streaming works, closing for now.

bmegli commented 4 years ago

Another interesting whitepaper by Intel:

Enabling High Quality Volumetric VOD Streaming Over Broadband and 5G

Some early comments after reading:

whitepaper discusses ~100 Mbps bandwidth (~10x data reduction)
- but for long-range wireless transmission even 10 Mbps is a lot (~100x data reduction)
motion to photon latency
- this is concerned mostly by having the data on device
- performing rendering locally on device (not remotely)
- and reacting to user motion immediately (headset)
- this should not be confused with glass-to-glass of 3d sensor streaming
  - glass-to-glass may and will be higher than a few ms
  - and represents update rate of the 3D world
the experiments use 65k vertices
- but even D435 at 848x480 may produce up to 848*480 = ~ 400k vertices per frame (!)
- at the same time the texture resolution considered is 2k
- this may mean low quality of 3D vs texture
the paper is only concerned with decoding time
- but the hard part is real time encoding
- decoding is typically order of magnitude faster
- so it looks like the paper is concerned with offline processing of the data (encoding) and only later serving it to real-time VR framework

bmegli / hardware-video-streaming

consider other options for textured depth streaming #4