Open rvandeghen opened 8 months ago
Hi @rvandeghen,
Thank you for reaching out. As far as I understand NVDEC doesn't expose this info and it is impossible to do that in DALI. What you can do instead is use the 'optical flow' operator.
@JanuszL thanks for the reply.
Do you know how to return both list of frames and list of OF ? I have an error which I guess comes from the fact that len(frames) = len(OF) + 1
, thus the shapes mismatch.
The code I use is the following:
@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size, stride=1, shard_id=0, num_shards=1, seed=0):
images = fn.readers.video(device="gpu",
filenames=files,
sequence_length=sequence_length,
normalized=False,
random_shuffle=False,
image_type=types.RGB,
dtype=types.UINT8,
initial_fill=16,
prefetch_queue_depth=2,
pad_last_batch=True,
name="Reader",
stride=stride,
enable_frame_num=False,
shard_id=shard_id,
num_shards=num_shards,
seed=seed,
)
of = fn.optical_flow(images, output_grid=1)
images = fn.crop_mirror_normalize(images,
dtype=types.FLOAT,
output_layout="FCHW",
mean=[0.279*255, 0.452*255, 0.378*255],
std=[0.188*255, 0.188*255, 0.171*255],
mirror=False,#fn.random.coin_flip(),
seed=seed
)
return images, of
class VideoDataset(pytorch.DALIGenericIterator):
def __init__(self, *kargs, **kvargs):
super().__init__(*kargs, **kvargs)
def __next__(self):
out, of = super().__next__()
# DDP is used so only one pipeline per process
# also we need to transform dict returned by DALIClassificationIterator to iterable
# and squeeze the lables
out = out[0]["data"]
of = of[0]["data]
B, F, C, H, W = out.size()
out = out.view(B*F, C, H, W)
return out, of
device_id = 0
shard_id = 0
num_shards = 1
batch_size = 1
sequence_length = 10
crop_size=(224, 224)
stride=5
pipeline = create_video_reader_pipeline(batch_size=batch_size,
sequence_length=sequence_length,
num_threads=10,
device_id=device_id,
shard_id=shard_id,
num_shards=num_shards,
files=container_files,
crop_size=crop_size,
stride=stride,
)
train_loader = VideoDataset(pipeline,
["data"],
reader_name="Reader",
auto_reset=True,
last_batch_policy=pytorch.LastBatchPolicy.FILL
)
Error:
IndexError Traceback (most recent call last)
Cell In[40], line 22
9 stride=5
11 pipeline = create_video_reader_pipeline(batch_size=batch_size,
12 sequence_length=sequence_length,
13 num_threads=10,
(...)
19 stride=stride,
20 )
---> 22 train_loader = VideoDataset(pipeline,
23 ["data"],
24 reader_name="Reader",
25 auto_reset=True,
26 last_batch_policy=pytorch.LastBatchPolicy.FILL
27 )
Cell In[39], line 37, in VideoDataset.__init__(self, *kargs, **kvargs)
36 def __init__(self, *kargs, **kvargs):
---> 37 super().__init__(*kargs, **kvargs)
File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:194, in DALIGenericIterator.__init__(self, pipelines, output_map, size, reader_name, auto_reset, fill_last_batch, dynamic_shape, last_batch_padded, last_batch_policy, prepare_first_batch)
192 if self._prepare_first_batch:
193 try:
--> 194 self._first_batch = DALIGenericIterator.__next__(self)
195 # call to `next` sets _ever_consumed to True but if we are just calling it from
196 # here we should set if to False again
197 self._ever_consumed = False
File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:220, in DALIGenericIterator.__next__(self)
218 # segregate outputs into categories
219 for j, out in enumerate(outputs[i]):
--> 220 category_outputs[self.output_map[j]] = out
222 # Change DALI TensorLists into Tensors
223 category_tensors = dict()
IndexError: list index out of range
Hi @rvandeghen,
I think your pipeline returns more than the iterator consumes. Can you try:
train_loader = VideoDataset(pipeline,
["images", "of"],
reader_name="Reader",
auto_reset=True,
last_batch_policy=pytorch.LastBatchPolicy.FILL
)
Hi @JanuszL,
Indeed the optical flow gives good results at barely no extra cost. However, I found in the blogpost the following information and I would like to know if DALI exposes this buffer ?
The Optical Flow API returns a buffer consisting of confidence levels (called cost) for each of the flow vectors to deal with these situations. The application can use this cost buffer to selectively accept or discard regions of the flow vector map.
Renaud
Hi @rvandeghen,
Currently, the operator doesn't ask the Optical Flow SDK to provide such values. If you have some spare time:
enableOutputCost
at https://github.com/NVIDIA/DALI/blob/main/dali/operators/sequence/optical_flow/optical_flow_impl/optical_flow_impl.cc#L321Hi @JanuszL,
Do you know if I should expect huge/small changes in the output depending on the value I set to hint_grid
?
I did some comparisons between the NVIDIA OF and RAFT (torchvision version: https://pytorch.org/vision/main/models/raft.html) and the output was much smoother with RAFT.
I also found that changing the value of hint_grid
from 1 to 8 does not change anything in the output values.
FYI, I'm using a A100 and my OF is defined as:
of = fn.optical_flow(images,
hint_grid=1, # change from 1 to 8
output_grid=1,
)
Is it the correct behavior ? I know that you are not directly related to NVOF, so if you know someone relevant to answer my questions, do not hesitate to share it with me.
Hi @rvandeghen,
To my knowledge, the behavior of NVIDIA OF depends on the driver version and the GPU available. It is probably best to ask on the NVIDIA forum. Also, DALI doesn't use the latest OF API (upgrading it is on our ToDo list but has low priority for now), you may check the relevant OpenCV interface and compare the results.
Describe the question.
Hello,
I was wondering if there is a way to obtain the motion vectors you compute when you decode a video on GPU?
Thanks
Check for duplicates