NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.18k stars 622 forks source link

Extract motion vectors #5363

Open rvandeghen opened 8 months ago

rvandeghen commented 8 months ago

Describe the question.

Hello,

I was wondering if there is a way to obtain the motion vectors you compute when you decode a video on GPU?

Thanks

Check for duplicates

JanuszL commented 8 months ago

Hi @rvandeghen,

Thank you for reaching out. As far as I understand NVDEC doesn't expose this info and it is impossible to do that in DALI. What you can do instead is use the 'optical flow' operator.

rvandeghen commented 8 months ago

@JanuszL thanks for the reply.

Do you know how to return both list of frames and list of OF ? I have an error which I guess comes from the fact that len(frames) = len(OF) + 1, thus the shapes mismatch.

The code I use is the following:

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size, stride=1, shard_id=0, num_shards=1, seed=0):
    images = fn.readers.video(device="gpu",
                              filenames=files,
                              sequence_length=sequence_length,
                              normalized=False,
                              random_shuffle=False,
                              image_type=types.RGB,
                              dtype=types.UINT8,
                              initial_fill=16,
                              prefetch_queue_depth=2,
                              pad_last_batch=True,
                              name="Reader",
                              stride=stride,
                              enable_frame_num=False,
                              shard_id=shard_id,
                              num_shards=num_shards,
                              seed=seed,
                             )

    of = fn.optical_flow(images, output_grid=1)

    images = fn.crop_mirror_normalize(images,
                                      dtype=types.FLOAT,
                                      output_layout="FCHW",
                                      mean=[0.279*255, 0.452*255, 0.378*255],
                                      std=[0.188*255, 0.188*255, 0.171*255],
                                      mirror=False,#fn.random.coin_flip(),
                                      seed=seed
                                     )

    return images, of

class VideoDataset(pytorch.DALIGenericIterator):
    def __init__(self, *kargs, **kvargs):
        super().__init__(*kargs, **kvargs)

    def __next__(self):
        out, of = super().__next__()
        # DDP is used so only one pipeline per process
        # also we need to transform dict returned by DALIClassificationIterator to iterable
        # and squeeze the lables
        out = out[0]["data"]
        of = of[0]["data]

        B, F, C, H, W = out.size()
        out = out.view(B*F, C, H, W)
        return out, of

device_id = 0
shard_id = 0
num_shards = 1
batch_size = 1
sequence_length = 10

crop_size=(224, 224)
stride=5

pipeline = create_video_reader_pipeline(batch_size=batch_size,
                                        sequence_length=sequence_length,
                                        num_threads=10,
                                        device_id=device_id,
                                        shard_id=shard_id,
                                        num_shards=num_shards,
                                        files=container_files,
                                        crop_size=crop_size,
                                        stride=stride,
                                        )

train_loader = VideoDataset(pipeline,
                            ["data"],
                            reader_name="Reader",
                            auto_reset=True,
                            last_batch_policy=pytorch.LastBatchPolicy.FILL
                            )

Error:

IndexError                                Traceback (most recent call last)
Cell In[40], line 22
      9 stride=5
     11 pipeline = create_video_reader_pipeline(batch_size=batch_size,
     12                                         sequence_length=sequence_length,
     13                                         num_threads=10,
   (...)
     19                                         stride=stride,
     20                                         )
---> 22 train_loader = VideoDataset(pipeline,
     23                             ["data"],
     24                             reader_name="Reader",
     25                             auto_reset=True,
     26                             last_batch_policy=pytorch.LastBatchPolicy.FILL
     27                             )

Cell In[39], line 37, in VideoDataset.__init__(self, *kargs, **kvargs)
     36 def __init__(self, *kargs, **kvargs):
---> 37     super().__init__(*kargs, **kvargs)

File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:194, in DALIGenericIterator.__init__(self, pipelines, output_map, size, reader_name, auto_reset, fill_last_batch, dynamic_shape, last_batch_padded, last_batch_policy, prepare_first_batch)
    192 if self._prepare_first_batch:
    193     try:
--> 194         self._first_batch = DALIGenericIterator.__next__(self)
    195         # call to `next` sets _ever_consumed to True but if we are just calling it from
    196         # here we should set if to False again
    197         self._ever_consumed = False

File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:220, in DALIGenericIterator.__next__(self)
    218 # segregate outputs into categories
    219 for j, out in enumerate(outputs[i]):
--> 220     category_outputs[self.output_map[j]] = out
    222 # Change DALI TensorLists into Tensors
    223 category_tensors = dict()

IndexError: list index out of range
JanuszL commented 8 months ago

Hi @rvandeghen,

I think your pipeline returns more than the iterator consumes. Can you try:

train_loader = VideoDataset(pipeline,
                            ["images", "of"],
                            reader_name="Reader",
                            auto_reset=True,
                            last_batch_policy=pytorch.LastBatchPolicy.FILL
                            )
rvandeghen commented 8 months ago

Hi @JanuszL,

Indeed the optical flow gives good results at barely no extra cost. However, I found in the blogpost the following information and I would like to know if DALI exposes this buffer ?

The Optical Flow API returns a buffer consisting of confidence levels (called cost) for each of the flow vectors to deal with these situations. The application can use this cost buffer to selectively accept or discard regions of the flow vector map.

Renaud

JanuszL commented 8 months ago

Hi @rvandeghen,

Currently, the operator doesn't ask the Optical Flow SDK to provide such values. If you have some spare time:

rvandeghen commented 7 months ago

Hi @JanuszL,

Do you know if I should expect huge/small changes in the output depending on the value I set to hint_grid ?

I did some comparisons between the NVIDIA OF and RAFT (torchvision version: https://pytorch.org/vision/main/models/raft.html) and the output was much smoother with RAFT.

I also found that changing the value of hint_grid from 1 to 8 does not change anything in the output values.

FYI, I'm using a A100 and my OF is defined as:

of = fn.optical_flow(images,
                     hint_grid=1, # change from 1 to 8
                     output_grid=1,
                    )

Is it the correct behavior ? I know that you are not directly related to NVOF, so if you know someone relevant to answer my questions, do not hesitate to share it with me.

JanuszL commented 7 months ago

Hi @rvandeghen,

To my knowledge, the behavior of NVIDIA OF depends on the driver version and the GPU available. It is probably best to ask on the NVIDIA forum. Also, DALI doesn't use the latest OF API (upgrading it is on our ToDo list but has low priority for now), you may check the relevant OpenCV interface and compare the results.