NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.04k stars 614 forks source link

Fast way to load optical flow video #2184

Open matanjacoby opened 4 years ago

matanjacoby commented 4 years ago

Hi, We have ground-truth optical flow fields that we want to pass as metadata together with our mp4 videos ( see also closed issue #1941 ).

Initially I get the optical flow as openEXR files (with two float channels) , and would be glad to know if I can somehow load it fast as video or other way. I was thinking of the VideoReader but not sure whether it can do it? I also thought about converting the OF files to color images and then to a video, can I do that without losing too much flow precision and be able to recover the flow on the GPU? I thought of: Normalize flow to [-1,1] => convert to HSV => store the normalization in some known saturation pixels => convert to RGB => mp4.

Any suggestions/alternatives are highly appreciated. Thanks.

mzient commented 4 years ago

I'd go for storing it as YCbCr - something like this should work:

len = sqrt(x*x + y*y)
Y = f(len) * 255
Cb = x / (len + eps) * 128 + 128
Cr = y / (len + eps) * 128 + 128

The function f(x) can be something like log2(x + 1) - something that will help you preserve high resolution for small offsets at the price of worse resolution at large offsets.

The logic behind using YCbCr this way is that large changes in direction will force the codec to allocate more bandwidth to chroma at spots where the direction changes abruptly. The epsilon in the denominator prevents areas with little motion from overwhelming the codec with noise.

But that's just one of many options.

matanjacoby commented 4 years ago

Thanks very much @mzient !

I tried that with several log bases, offsets & reducing saturation, but still even before video encoding and decoding (that is, I tested flow2YCbCr followed by the inverse), I get considerable errors.

Do you have some other idea? Many thanks!