NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.11k stars 618 forks source link

The image pixels after video decoding seem to be lossy #5644

Closed chenc29xpeng closed 3 weeks ago

chenc29xpeng commented 3 weeks ago

Describe the question.

Description

I'm doing lossless decoding of mp4 to image. I want to know if DaLi's gpu-based video decoding is lossy. I compared the image pixels decoded by opencv and DaLi. I found that the two are not the same, and the maximum difference is 43. Why is there a difference?

By the way, I think the image decoded by opencv is lossless, because I compared the pixel values ​​of the image decoded by ffmpeg-python and opencv, they are exactly the same.

Can anyone help check the code or give a reasonable explanation?

Video

The video I used is the official example: https://github.com/NVIDIA/DALI_extra/blob/main/db/optical_flow/sintel_trailer/sintel_trailer_short.mp4

Code

import os
import cv2
import tempfile
import numpy as np
from PIL import Image
from typing import List, Optional
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
from nvidia.dali.plugin.pytorch import DALIGenericIterator

def mp4_to_image_opencv(input_file: str) -> List[np.ndarray]:
    """
    Return: list
    """
    frame_list = []
    cap = cv2.VideoCapture(input_file)
    while(cap.isOpened()):
        ret, frame = cap.read()
        if ret == True:
            frame_list.append(frame)
        else:
            break
    cap.release()
    return frame_list

if __name__ == "__main__":
    # dali_extra_path = os.environ["DALI_EXTRA_PATH"]
    # video_filename = os.path.join(
    #     dali_extra_path, "db/optical_flow/sintel_trailer/sintel_trailer_short.mp4"
    # )
    video_filename = 'sintel_trailer_short.mp4'
    save_dali_path = "debug_dali.png"
    save_opencv_path = "debug_opencv.png"
    save_frame_num = 10

    opencv_frame_list = mp4_to_image_opencv(video_filename)
    cv2.imwrite(save_opencv_path, opencv_frame_list[save_frame_num])
    sequence_length = 64

    @pipeline_def
    def video_pipe(file_list):
        video, label = fn.readers.video(device="gpu", file_list=file_list, sequence_length=sequence_length, file_list_frame_num=True, name="my_reader", pad_sequences=True)
        return video, label

    my_file_list_str = f"{video_filename} 0 0 20\n" # label start_frame_num end_frame_num
    tf = tempfile.NamedTemporaryFile()
    tf.write(str.encode(my_file_list_str))
    tf.flush()
    pipe = video_pipe(batch_size=1, file_list=tf.name, num_threads=1, device_id=0)
    pipe.build()

    dali_iter = DALIGenericIterator([pipe], ["image", "label"], reader_name="my_reader")
    for data in dali_iter:
        label = data[0]['label'].cpu().numpy()[0][0]
        image = data[0]['image'][0]
        dali_pixel_map = image[save_frame_num].cpu().numpy().astype(np.int16) # Prevent negative numbers from crossing the boundary
        cv_pixel_map = opencv_frame_list[save_frame_num][:, :, [2, 1, 0]].astype(np.int16) # BGR -> RGB
        print(np.amax(dali_pixel_map - cv_pixel_map))
        print(np.amin(dali_pixel_map - cv_pixel_map))
        pil_image = Image.fromarray(image[save_frame_num].cpu().numpy())
        pil_image.save(save_dali_path)
        break

Check for duplicates

awolant commented 3 weeks ago

Hello @chenc29xpeng

thanks for creating the issue. The file you linked is using H.264 codec and this codec is not lossless. With decoding H.264 there are certain liberties that decoders have when it comes to the final result. That is why you see different results from different decoders.

opencv uses the same decoder as FFmpeg under the hood and that is why the result is the same. DALI uses NVIDIA Video Codec SDK to decode videos and this is slightly different implementation.

chenc29xpeng commented 3 weeks ago

Hello @chenc29xpeng

thanks for creating the issue. The file you linked is using H.264 codec and this codec is not lossless. With decoding H.264 there are certain liberties that decoders have when it comes to the final result. That is why you see different results from different decoders.

opencv uses the same decoder as FFmpeg under the hood and that is why the result is the same. DALI uses NVIDIA Video Codec SDK to decode videos and this is slightly different implementation.

@awolant Thanks for your quick reply. But the same problem occurs when I try to use H.265 codec file. H.265 codec is lossless, so in this case, which image is correct after decoding by opencv and dali? h265 mp4: https://drive.google.com/file/d/191-cqQPCkpDoVITA9jujqts7lryij0f7/view?usp=drive_link

mzient commented 3 weeks ago

Hello @chenc29xpeng H.265 isn't lossless either - there are no lossless video codecs in common use. Both H.264 and H.265 are predictive codecs - a part of the codec operates in the loop, where the previous decoded frame is used to decode the next one. That part has strictly defined output - but it's not the final result of decoding. The final RGB image is obtained by applying chroma upsampling, color space conversion and possibly some postprocessing filters. It's that part that may be different across implementations. Also, I would be very cautious assuming that a particular implementation (especially OpenCV) is somehow the "correct" one based on popularity.

chenc29xpeng commented 3 weeks ago

Hello @chenc29xpeng H.265 isn't lossless either - there are no lossless video codecs in common use. Both H.264 and H.265 are predictive codecs - a part of the codec operates in the loop, where the previous decoded frame is used to decode the next one. That part has strictly defined output - but it's not the final result of decoding. The final RGB image is obtained by applying chroma upsampling, color space conversion and possibly some postprocessing filters. It's that part that may be different across implementations. Also, I would be very cautious assuming that a particular implementation (especially OpenCV) is somehow the "correct" one based on popularity.

@mzient what you mean is that it is normal for the pixel values ​​to be different after decoding between opencv and dali? Sorry, I need to confirm this conclusion.

JanuszL commented 3 weeks ago

Hi @chenc29xpeng,

it is normal for the pixel values ​​to be different after decoding between opencv and dali? Sorry, I need to confirm this conclusion.

The thing that is defined by the standard is how the YUV raw output should look like. The conversion from it to RGB (interpolation and conversion to a different color space) is subjected to numerical differences, as well as different interpolation methods could be used to improve the perception of produced images. So it is expected that OpenCV/FFmpeg and DALI (that used NVDEC under the hood) can yield different, still valid, results.

chenc29xpeng commented 3 weeks ago

Hi @chenc29xpeng,

it is normal for the pixel values ​​to be different after decoding between opencv and dali? Sorry, I need to confirm this conclusion.

The thing that is defined by the standard is how the YUV raw output should look like. The conversion from it to RGB (interpolation and conversion to a different color space) is subjected to numerical differences, as well as different interpolation methods could be used to improve the perception of produced images. So it is expected that OpenCV/FFmpeg and DALI (that used NVDEC under the hood) can yield different, still valid, results.

@JanuszL Thank you for your thoughtful explanation, I got the answer I wanted.