NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.07k stars 615 forks source link

VideoReader unresponsive/hangs on variable frame rate videos. #1041

Closed Mmdixon closed 5 years ago

Mmdixon commented 5 years ago

Given a video with variable frame rate, i.e. ffmpeg -i vfr_test.mp4 -vf vfrdet -f null - reports non-zero, and a Pipeline with only a VideoReader, calling pipeline.run() hangs indefinitely and I end up having to kill the python process (doesn't respond to SIGINT).

The same video with the times adjusted to constant frame rate works fine.

>>> nvidia.dali.__version__
'0.11.0'
awolant commented 5 years ago

Hi, thanks for reporting the issue. I will double check that, but I think our current implementation is not supporting variable frame rates. I will get back to you on this.

JanuszL commented 5 years ago

Hi, Could you tell if this always reproduces on any video or only selected one? Can you provide some simple self-contained script and video sample that reproduces this problem?

Mmdixon commented 5 years ago

The video sample is a bit contrived, but an easy way to create a variable frame rate video with limited tools and should reproduce the issue. Create the video source:

#!/bin/bash
# Create a 3 second video at 60fps at constant frame rate.
# 180 frames, with a time delta of 1/60ms. Duration 3s.
ffmpeg -f lavfi -i color=c=blue:s=1280x720:d=3:r=60 \
-c:v libx264 \
-vf "format=pix_fmts=yuv420p, drawtext=fontsize=64: fontcolor=white: font=monospace: x=(w-text_w)/2: y=(h-text_h)/2: r=60: text='%{frame_num}'" \
cfr_test.mp4
# Transcode video to 25fps at variable frame rate.
# 180 frames, time deltas spaced between 1/30ms and 1/20ms. Duration ~7s.
ffmpeg -i cfr_test.mp4 -vsync vfr -vf setpts='N/(25*TB)' vfr_test.mp4

Run the Pipeline:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops

class VideoPipe(Pipeline):
    def __init__(self, device_id, filenames, batch_size=2, sequence_length=60, num_threads=2):
        super().__init__(batch_size=batch_size, num_threads=num_threads, device_id=device_id)
        self.video = ops.VideoReader(device="gpu", filenames=filenames,
                                     sequence_length=sequence_length)

    def define_graph(self):
        output = self.video(name="Reader")
        return output

if __name__ == "__main__":
    device_id = 0
    filenames = ["vfr_test.mp4"]
    pipeline = VideoPipe(device_id, filenames)

    pipeline.build()
    result = pipeline.run()
    print(result)

Prints fine with cfr_test.mp4, hangs on vfr_test.mp4. ctrl+c doesn't interrupt but can terminate with ctrl+\

awolant commented 5 years ago

Thanks for fast and good repro. I was able to do this on my machine and it behaves exactly as you described. As I've mention before, currently we do not support variable frame rates in our reader. @JanuszL pointed out exact reason for the hang you are experiencing: we rely on constant frame rate to correctly identify the frame and with variable frame rate we end up waiting indefinitely for the data (link) Could you formulate some more detailed requirements of what would you need? Or even better, are you willing to implement this yourself as a contribution to DALI? If you are, we are happy to provide some guidance. I think the place to start is nvdecoder code. For now we will track this internally as DALI-951

JanuszL commented 5 years ago

By requirements @awolant means what would you expect to get when the video is VFR? When you want to get the 2nd frame you want to get it no matter what is the time between 1st and 2nd, or assume that video has some fixed frame rate and interpolate 2nd frame from real 1st and 2nd one.

Mmdixon commented 5 years ago

I guess my first expectation would be the process not hang, if VFR is not supported then maybe throw an exception?

Given that a VideoReader pulls frames by sequence_length, I would expect it to read all the frames like your first suggestion and just ignore the temporal information. The workaround for this is not so bad because you can edit the PTS/DTS timecodes of a video container quickly to get CFR without re-encoding the video and everything works.

What I would find more interesting with a VFR video is to pull frames by a time sequence, e.g. Get all the frames that occur within a 1 second interval. This would probably mean the batch tensor couldn't be dense (since different sequence lengths), but would be dense for the CFR case.

Or could keep the fixed sequence_length and have a time-based sub-sampling, e.g. grab 60 frames 1/60ms apart with duplicate/decimate behavior to meet these requirements. So if you have a really long duration frame (longer than sub-sample interval) the video reader will keep sampling it, and if you have multiple really short frame durations (shorter than the sub-sample interval) then some of those frames will be skipped. This would be ideal in the context of recurrent networks (like a CNN feed into a LSTM) as most of these architectures only work with fixed time intervals/deltas and you won't be ignoring the temporal information. The workaround for this case requires re-encoding the video at CFR to essentially bake-in the variable time differences between frames. This takes longer (although NVENC does give a nice speedup) and there is worry about lossy [re]transcoding; plus the fact that features extracted by convolutions are very sensitive to compression artifacts.

I was thinking VFR interpolation as picking the floor/ceiling/nearest frame strategy. But as your second suggestion, it might be interesting for neural networks to interpolate as a mix of pixels between the real frames by some strategy (linear/cubic/sigmoid, etc), as that would have better differentiable properties.

awolant commented 5 years ago

Proper error message was added and merged today in #1067 You can out check next DALI nightly build. Closing the issue for now.

BlackPepperAPI commented 4 years ago

Hi @awolant , I'm still experiencing the Hang Issue. I'm experiencing the issue when I read the .avi file. I'm using nvidia-dali version

Version: 0.17.0
Summary: NVIDIA DALI for CUDA 9.0. Git SHA: e61c304d9f5560fff1be5c821ee140cdab104aef

The process hang indefinitely unless the process is killed. It gives no error message or anything but hang forever.

Code

This is my code below, I defined my VideoPipeline as follows:

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia dali.types as types

class VideoReaderPipeline(Pipeline)
    def __init__(self, filenames, batch_size, sequence_length, crop_size, 
                num_threads, device_id, output_layout=types.NCHW,
                random_shuffle=True, step=-1, seed=42):
        super().__init__(batch_size, num_threads, device_id, seed=seed)

        # Define video reader
        self.reader = ops.VideoReader(device = "gpu",
                filenames = filenames,
                sequence_length = sequence_length,
                normalized = False,
                random_shuffle = random_shuffle,
                image_type = types.RGB,
                dtype = types.UINT8,
                step = step,
                initial_fill = 16)

        self.cropnorm = ops.CropMirrorNormalize(device = "gpu",
                seed = seed,
                crop = crop_size,
                output_dtype = types.FLOAT,
                output_layout = types.NFCHW)

    def define_graph(self):
        """ Definition of graph-event that defines flow of video pipeline
        """
        input_vid  = self.reader(name = "Reader")
        output_vid = self.cropnorm(input_vid)

        return output_vid

Then I run the above pipeline with a simple script:

import nvidia
import nvidia.dali.ops as ops
import nvidia.dali.types as types

from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin.pytorch import DALIGenericIterator
from torchaction.dataloader import VideoReaderPipeline

def main():
     filenames = ["data/HMDB51/raw/temp/video_one.avi",]
     if not os.path.isfile(filenames[0]):
        raise FileNotFoundError("salah coy: %s" % filenames)
     # Define video reader pipeline
     pipeline = VideoReaderPipeline(filenames, 
                batch_size = 2,
                sequence_length = 16,
                 crop_size = (224,224),
                 num_threads = 2,
                 device_id = 0,
                 random_shuffle = True,
                 step = 2)

     # Build pipeline
     pipeline.build()
     print("Building pipeline...")
     for k in range(10):
         print("Running pipeline")
         pipeout = pipeline.run()
         sequence_out = pipeout[0].as_cpu().as_array()  # [batch_size, sequence_length, channel, height, width]
         print("Result:", sequence_out.shape)  # printing result

Observation:

Setup:

Observation:

@awolant do you mind investigating the issue here? or I can push a different issue if that's more convenient.

I can also upload the four .avi video, if you need to reproduce them. It was taken from HMDB51 dataset

[12/01/2020] Update on observation

1 Passing filenames = <any of those videos> to VideoReaderPipeline hangs indefnitely.

  1. Interesting observation for video_one.avi and video_four.avi, it first printed "Results.." for first 10 iterations of pipeline.run() (so first couple of sequence) but as we increase the number of iterations to say 15 or 20 iterations, it hangs again. (updated).
  2. Then, passing filenames = [<two or more of those videos>] also hangs indefinitely
JanuszL commented 4 years ago

@BlackPepperAPI - yes, please upload the files so we can run the repro locally.

JanuszL commented 4 years ago

@a-sansanwal - FYI

BlackPepperAPI commented 4 years ago

Hi @JanuszL @a-sansanwal

Here is my videos that I use for the above observation. Please cross check and refer to my updated observation. I really appreciate your time taking the teddious action of testing with me.

Inside the videos.zip are four (4) videos with .avi format.

Video Zip attachment:

Attachment: videos.zip

Updates on Observation

It turns out all of those .avi videos result in a hang indefinitely. The video_one.avi hangs if I increase the iteration number. The detailed explaination:

1 Passing filenames = "<any of those video paths>" to VideoReaderPipeline hangs indefnitely.

  1. Interesting observation for filenames = "video_one.avi" or "video_four.avi", it first printed Results: np.array([N, F, C, H, W]) for first 10 iterations of pipeline.run() (so first 10 batch of sequences) but as we increase the number of iterations to say 15 or 20 iterations, it hangs indefinitely. Perhaps the video closed? (updated)
  2. Then, passing filenames = ["<two or more of those videos>",] also hangs indefinitely in the very first iteration. So unlike the hang in number 2, it didn't even print Results: .... before it hangs. When I checked, it actually contains a video, with 25 fps for about several seconds. I still use the same batch_size and sequence length FYI.

I suspect that (I'm not an expert so take it with pinch of salt):

  1. The video format .avi might not be compatible or bad for data loading
  2. The frame rate must be constant

Workaround for now

In summary the workaround kinda solve the hang issue, but I ran into another issue at point 4:

  1. I used ffmpeg command in Ubuntu 18.04 to convert those nasty .avi videos to mp4, while also converting the frame rate to be constant -r 30 following similar ffmpeg command here.
  2. I find converting the frame rate but keeping the output format .avi as I run ffmpeg result in a video that still doesn't work when loaded with NVIDIA Dali ops.VideoReader.
  3. The workaround successfully read all the frames from those filenames, when I passed filenames = [<mp4 videos>]. Then I proceed to use plugins.pytorch.DALIGenericIterator, modify the for-loop a bit. It also read all the frames successfully, printing Results: torch.Size[N, F, C, H, W] and finished the for-loop.
  4. However converting framerate + convert to .mp4 dismiss this hang issue, but I got another Issue that I commented at [Issue #1637] (https://github.com/NVIDIA/DALI/issues/1637) when I use file_root or file_list argument for ops.VideoReader
BlackPepperAPI commented 4 years ago

I used the nightly build for version 0.17.0 as suggested for the above. Thanks!

a-sansanwal commented 4 years ago

Hi @BlackPepperAPI The videos you posted have packed b-frames which is an ugly hack used in avi containers. A workaround was already added for this in DALI.

Also according to ffprobe all the videos you posted have this issue where they don't start from 0 timestamp. DALI waits for 1st frame(0.03333 timestamp) which is not present, causing the hang.

ffprobe -show_frames video_one.avi  | grep best_effort_timestamp | less
best_effort_timestamp=2
best_effort_timestamp_time=0.066667
best_effort_timestamp=3
best_effort_timestamp_time=0.100000
best_effort_timestamp=4
best_effort_timestamp_time=0.133333
best_effort_timestamp=5
best_effort_timestamp_time=0.166667
best_effort_timestamp=6
best_effort_timestamp_time=0.200000
best_effort_timestamp=7
best_effort_timestamp_time=0.233333
best_effort_timestamp=8
best_effort_timestamp_time=0.266667
best_effort_timestamp=9
best_effort_timestamp_time=0.300000

@BlackPepperAPI Is the dataset huge ? If its possible i would suggest remuxing the videos with correct timestamp or into mp4 container like you suggested you have already tried. Otherwise, I could suggest a way of getting it to run on your dataset with this one line hack. I will also think of ways to fix this without hacks.

diff --git a/dali/operators/reader/loader/video_loader.h b/dali/operators/reader/loader/video_loader.h
index 1ed7113b..68451575 100644
--- a/dali/operators/reader/loader/video_loader.h
+++ b/dali/operators/reader/loader/video_loader.h
@@ -202,7 +202,7 @@ class VideoLoader : public Loader<GPUBackend, SequenceWrapper> {
       const auto stream = file.fmt_ctx_->streams[file.vid_stream_idx_];
       int frame_count = file.frame_count_;

-      int start_frame = 0;
+      int start_frame = 2;
       int end_frame = file.frame_count_;
       float start = file_info_[i].start_time;
       float end = file_info_[i].end_time;
a-sansanwal commented 4 years ago

From my observation, video files that are VFR: video_two.avi, video_three.avi

Also I noticed that none of the videos you uploaded were vfr.