NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.06k stars 615 forks source link

Support image_type in nvidia.dali.fn.experimental.decoders.video #5610

Closed zeruniverse closed 2 days ago

zeruniverse commented 3 weeks ago

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Should have (e.g. Adoption is possible, but the performance shortcomings make the solution inferior).

Please provide a clear description of problem this feature solves

Currently the output is by default in RGB. If original video is encoded in YUV, and you encode the RGB output back to YUV, the YUV values will be different sometimes due to out-of-range trimming.

nvidia.dali.fn.readers.video has image_type parameter to address this problem (YCbCr), can nvidia.dali.fn.experimental.decoders.video (and maybe nvidia.dali.fn.experimental.inputs.video) implement the image_type parameter as well?

Feature Description

Add image_type param to nvidia.dali.fn.experimental.decoders.video. Param definition can be same as that in nvidia.dali.fn.readers.video

Describe your ideal solution

Add image_type param to nvidia.dali.fn.experimental.decoders.video. Param definition can be same as that in nvidia.dali.fn.readers.video

Describe any alternatives you have considered

No response

Additional context

No response

Check for duplicates

JanuszL commented 1 week ago

Hi @zeruniverse,

Thank you for reaching out. I think this is doable, can you tell us more about your use case? Also, regarding YUV->RGB->YUV, DALI does its best to learn from the video metadata the profile and perform YUV->RGB conversion as accurately as possible. When you just use YUV directly you may lose the information if this is the full or reduced range.

zeruniverse commented 1 week ago

We encode CT file data into HEVC's Y channel. The raw voxel data is quantized into [0, 255]. Currently, we use PyAV to decode it directly so Y channel won't change. If the decoded value is RGB and we convert it back to YUV, the Y channel value will be capped at range around [16, 135].

JanuszL commented 1 week ago

Hi @zeruniverse,

Can you check if, during the PyAV encoding, you do use a full range color profile, the same applies when you convert to YUV back from RGB (you can check if RGB correctly corresponds to your Y data).

zeruniverse commented 1 week ago

Hi Janusz,

When I encode CT data into HEVC, I use YUV420 and directly writes YUV data (av.VideoFrame.from_ndarray(yuv_array, format=“yuv420p”)). Thus, during PyAV decoding, I can use same format in to_ndarray to get same Y data. In the PyAV pipeline, I don’t deal with RGB at all.

The problem here is during encoding, the color I encoded with YUV might not be able to represent using RGB [0-255]

On Tue, Sep 3, 2024 at 3:20 PM Janusz Lisiecki @.***> wrote:

Hi @zeruniverse https://github.com/zeruniverse,

Can you check if, during the PyAV encoding, you do use a full range color profile, the same applies when you convert to YUV back from RGB (you can check if RGB correctly corresponds to your Y data).

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/5610#issuecomment-2325778812, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDO6NCYGK2YK7P6UHRPBZDZUVPL3AVCNFSM6AAAAABM7V7WOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRVG43TQOBRGI . You are receiving this because you were mentioned.Message ID: @.***>

zeruniverse commented 1 week ago

For example,

I can encode color with YUV [0, 0, 0], but Y from RGB is:

Y = (0.257 R) + (0.504 G) + (0.098 * B) + 16

If I get RGB first during decoding, the minimum Y I can get is 16 (because RGB >=0). This is different than the Y I encoded.

On Tue, Sep 3, 2024 at 5:34 PM Jeffery ZHAO @.***> wrote:

Hi Janusz,

When I encode CT data into HEVC, I use YUV420 and directly writes YUV data (av.VideoFrame.from_ndarray(yuv_array, format=“yuv420p”)). Thus, during PyAV decoding, I can use same format in to_ndarray to get same Y data. In the PyAV pipeline, I don’t deal with RGB at all.

The problem here is during encoding, the color I encoded with YUV might not be able to represent using RGB [0-255]

On Tue, Sep 3, 2024 at 3:20 PM Janusz Lisiecki @.***> wrote:

Hi @zeruniverse https://github.com/zeruniverse,

Can you check if, during the PyAV encoding, you do use a full range color profile, the same applies when you convert to YUV back from RGB (you can check if RGB correctly corresponds to your Y data).

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/5610#issuecomment-2325778812, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDO6NCYGK2YK7P6UHRPBZDZUVPL3AVCNFSM6AAAAABM7V7WOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRVG43TQOBRGI . You are receiving this because you were mentioned.Message ID: @.***>

JanuszL commented 1 week ago

I think you are using BT 601 formula, you may look at different profiles to avoid this limited dynamics (range).

zeruniverse commented 1 week ago

Thanks! I will try to encode color profile into HEVC metadata and try again

On Tue, Sep 3, 2024 at 5:56 PM Janusz Lisiecki @.***> wrote:

I think you are using BT 601 https://en.wikipedia.org/wiki/Rec._601 formula, you may look at different profiles to avoid this limited dynamics (range) https://en.wikipedia.org/wiki/Y%E2%80%B2UV#Conversion_to/from_RGB.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/5610#issuecomment-2326101800, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDO6NARZTOLCD3Z3U3VTNDZUWBVZAVCNFSM6AAAAABM7V7WOKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRWGEYDCOBQGA . You are receiving this because you were mentioned.Message ID: @.***>

awolant commented 1 week ago

Hello @zeruniverse

thanks for creating the issue. Since nvidia.dali.fn.readers.video supports image_type what is the reason you use nvidia.dali.fn.experimental.decoders.video?

zeruniverse commented 1 week ago

Hello @zeruniverse

thanks for creating the issue. Since nvidia.dali.fn.readers.video supports image_type what is the reason you use nvidia.dali.fn.experimental.decoders.video?

@awolant I use DALI in Triton for preprocessing. The input is [-1] UINT8 array for HEVC binary, so it's already in memory instead of local file. That's the reason I use nvidia.dali.fn.experimental.decoders.video