Errors when reading .webm and converted mp4 files

Hi,

I'm trying to using DALI to train on the something-something-v2 dataset. The video files are all .webm files encoded using the VP9 codec. I'm not sure if the DALI VideoReader supports .webm files, but when I try to use it with skip_vfr_check=False (since otherwise it errors out saying VFR is not supported but the video seems to be CFR with frame rate 12/1), it ends up eating up a lot of RAM (e.g. uses ~100GB of RAM for 1000 ~70-90KB videos during pipeline.build()). After awhile, I get the following error:

>>> singularity exec --nv docker://nvcr.io/nvidia/pytorch:21.02-py3 python example.py
Traceback (most recent call last):
  File "example.py", line 37, in <module>
    pipeline.build()
  File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 481, in build
    self._pipe.Build(self._names_and_devices)
RuntimeError: Critical error when building pipeline:
Error when constructing operator: VideoReader encountered:
std::bad_alloc
Current pipeline object is no longer valid.

I tried to convert the .webm videos to a fixed frame rate .mp4 format using this script:

import os
import os.path as osp
import argparse
import glob
import multiprocessing as mp
from tqdm import tqdm

def worker(args):
    filename, output_dir = args
    f = osp.basename(filename)
    f = osp.splitext(f)[0] + '.mp4'
    output_filename = osp.join(output_dir, f)
    cmd = f'ffmpeg -i "{filename}" -vf "crop=trunc(iw/2)*2:trunc(ih/2)*2" -r 12/1 -c:v libx264 -c:a copy -crf 23 "{output_filename}" >/dev/null 2>&1'
    os.system(cmd)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--input_dir', type=str, default='20bn-something-something-v2')
    parser.add_argument('-o', '--output_dir', type=str, default='converted')
    args = parser.parse_args()

    os.makedirs(args.output_dir)

    files = glob.glob(osp.join(args.input_dir, '*.webm'))
    print(f"Found {len(files)} video files")
    files = [(f, args.output_dir) for f in files]

    with mp.Pool(mp.cpu_count()) as p:
        r = list(tqdm(p.imap(worker, files), total=len(files)))

And the video loader works for some batches but eventually errors out on failing to decode specific .mp4 files. Those files do seem fine, as I'm able to view them on my video players, and in the browser. Below is the error I get:

>>> singularity exec --nv docker://nvcr.io/nvidia/pytorch:21.02-py3 python example.py
0 video with shape torch.Size([32, 3, 16, 224, 224])
1 video with shape torch.Size([32, 3, 16, 224, 224])
2 video with shape torch.Size([32, 3, 16, 224, 224])
3 video with shape torch.Size([32, 3, 16, 224, 224])
4 video with shape torch.Size([32, 3, 16, 224, 224])
5 video with shape torch.Size([32, 3, 16, 224, 224])
6 video with shape torch.Size([32, 3, 16, 224, 224])
7 video with shape torch.Size([32, 3, 16, 224, 224])
8 video with shape torch.Size([32, 3, 16, 224, 224])
9 video with shape torch.Size([32, 3, 16, 224, 224])
10 video with shape torch.Size([32, 3, 16, 224, 224])
11 video with shape torch.Size([32, 3, 16, 224, 224])
12 video with shape torch.Size([32, 3, 16, 224, 224])
13 video with shape torch.Size([32, 3, 16, 224, 224])
14 video with shape torch.Size([32, 3, 16, 224, 224])
/opt/dali/dali/operators/reader/nvdecoder/nvdecoder.cc:157: Unable to decode file /home/wilson/data/datasets/something-something/converted/180116.mp4
terminate called after throwing an instance of 'dali::CUDAError'
  what():  CUDA driver API error CUDA_ERROR_UNKNOWN (999):
unknown error
Aborted (core dumped)

I ran the following example.py script in nvidia's latest pytorch container to produce the results above, in addition to downloading some videos from something-something-v2.

import os
import os.path as osp

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
from nvidia.dali.plugin import pytorch

class VideoPipe(Pipeline):
    def __init__(self, batch_size, num_workers, device_id, seed):
        super().__init__(batch_size, num_workers, device_id, seed)

        root = '/home/wilson/data/datasets/something-something/converted'
        files = os.listdir(root)
        files = [osp.join(root, f) for f in files]

        self.input = ops.VideoReader(device='gpu', filenames=files,
                                     sequence_length=16, normalized=False,
                                     shard_id=0, num_shards=1, random_shuffle=True,
                                     initial_fill=batch_size, skip_vfr_check=True)
        self.crop = ops.Crop(device='gpu', crop=(224, 224))
        self.tp = ops.Transpose(device='gpu', perm=[3, 0, 1, 2])

    def define_graph(self):
        output = self.input(name='Reader')
        output = self.crop(output)
        output = self.tp(output)
        return output

batch_size = 32
num_workers = 4
device_id = 0
seed = 0

pipeline = VideoPipe(batch_size, num_workers, device_id, seed)
pipeline.build()

loader = pytorch.DALIGenericIterator(pipeline, ['video'], reader_name='Reader',
                                     last_batch_policy=pytorch.LastBatchPolicy.DROP,
                                     auto_reset=True)

for i, batch in enumerate(loader):
    print(i, 'video with shape', batch[0]['video'].shape)

I'm not too familiar with video decoding / encoding, so maybe I just used ffmpeg incorrectly, but any help on this would be appreciated!

Hi @wilson1yan,

Can you try the latest DALI nightly build and see if that helps with WebM files (we have changed a bit the underlying FFmpeg build DALI uses under the hood for video parsing)?

Regarding dali::CUDAError, please keep in mind that you are decoding the video in the full resolution first and then crop it. So you have 32 16 2 (batch size sequence length buffering) full-sized images. It could be just too much. Please reduce your batch size and see if that helps.

Thanks @JanuszL for replying! I tried to install the nightly build

>>> pip freeze | grep dali
nvidia-dali-nightly-cuda110==1.0.0.dev20210301

and after running my example.py script I don't get the .webm error I had before in pipeline.build(), but it now stalls during the iterator initialization when fetching the first batch of data (i.e. when calling share_outputs()). I see basically zero CPU / GPU usage during the stalling so I don't think it's doing computations and maybe it's waiting for something but the I/O itself should not take that long (i.e. many minutes). If I change it to batch_size = 1, initial_fill=1, it is able to load 2 videos but then stalls similarly afterwards. Running the same code on UCF-101 videos (similar size videos as something-something) produces no errors and runs smoothly.

Regarding your second point about the dali::CUDAError, I believe that memory usage shouldn't be a problem, as I'm running on a machine with about 400 GB of RAM and 25 GB GPU memory. I tried running the script with a batch size of 1 and it seemed to produce a more detailed error output: (VERY odd, but I was only able to produce this output once, and running the same script aftewards only produced up to unknown error)

489 video with shape torch.Size([1, 3, 16, 224, 224])
490 video with shape torch.Size([1, 3, 16, 224, 224])
491 video with shape torch.Size([1, 3, 16, 224, 224])
492 video with shape torch.Size([1, 3, 16, 224, 224])
493 video with shape torch.Size([1, 3, 16, 224, 224])
494 video with shape torch.Size([1, 3, 16, 224, 224])
/opt/dali/dali/operators/reader/nvdecoder/nvdecoder.cc:157: Unable to decode file /home/wilson/data/datasets/something-something/converted/180116.mp4
terminate called after throwing an instance of 'dali::CUDAError'
  what():  CUDA driver API error CUDA_ERROR_UNKNOWN (999):
unknown error
Traceback (most recent call last):
  File "example.py", line 47, in <module>
    for i, batch in enumerate(loader):
  File "/home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 194, in __next__
    outputs = self._get_outputs()
  File "/home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/plugin/base_iterator.py", line 255, in _get_outputs
    outputs.append(p.share_outputs())
  File "/home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 721, in share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline:
Error when executing GPU operator readers__Video encountered:
[/opt/dali/dali/operators/reader/nvdecoder/cuvideodecoder.cc:166] Encountered a dynamic video format change.
Stacktrace (16 entries):
[frame 0]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x43898e) [0x7f0303c0b98e]
[frame 1]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x23e4813) [0x7f0305bb7813]
[frame 2]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x23e7ff7) [0x7f0305bbaff7]
[frame 3]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x11680) [0x7f01f0150680]
[frame 4]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x66560) [0x7f01f01a5560]
[frame 5]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x4fe2b) [0x7f01f018ee2b]
[frame 6]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x51e06) [0x7f01f0190e06]
[frame 7]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x66937) [0x7f01f01a5937]
[frame 8]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x66ef1) [0x7f01f01a5ef1]
[frame 9]: /usr/lib/x86_64-linux-gnu/libnvcuvid.so(+0x1141b) [0x7f01f015041b]
[frame 10]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x23e8649) [0x7f0305bbb649]
[frame 11]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x23e8a9f) [0x7f0305bbba9f]
[frame 12]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x23d577b) [0x7f0305ba877b]
[frame 13]: /home/wilson/miniconda3/envs/videoclip/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2bb0eff) [0x7f0306383eff]
[frame 14]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f03391586db]
[frame 15]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f0338e8188f]

Current pipeline object is no longer valid.
Aborted (core dumped)

Though I'm not too familiar with this error, and how the structure of the webm files or my script might be causing this

and after running my example.py script I don't get the .webm error I had before in pipeline.build(), but it now stalls during the iterator initialization when fetching the first batch of data (i.e. when calling share_outputs()).

It happens when the video if VFR or frame from the requested sequence is missing. In such a case DALI just waits for the decoder to produce the frame (frame with a particular timestamp) it is waiting for but it simply doesn't exist.

Regarding the error - from the stack trace I see [/opt/dali/dali/operators/reader/nvdecoder/cuvideodecoder.cc:166] Encountered a dynamic video format change. which means that your dataset has videos with different codecs and that is not supported by DALI. The problem is caused by VideoSDK which requires a full decoder reinitialization to change the format of the decoded video. As DALI supports extraction of sequences from a video in random order (order of sequences and videos) each sample in the batch might have a different codec and the decoder is reinitialized very often. We have this functionality in our backlog but it requires non-negligible effort to make it right.

I see, it seems like a tiny fraction of the .webm videos were coded under a different VP9 profile, and thus under a different h264 profile when converting to .mp4. Everything seems to run smoothly on the mp4 files after deleting some of the bad ones, though still not sure what the issue is for the webm files. Thanks for the help!

Two other question I have are: 1) I noticed that pipeline.build() is pretty slow, probably because it takes the video reader time to read the metadata of ~200k videos. Is there anyway to cache this result somewhere and load it? i.e. perhaps the data lies in some VideoReader attribute and I can just write my own VideoReader that saves / loads it if given

2) Is there a natural way to extend some of the image only ops to extend to videos? For example, I want to use ColorTwist on each from of a video, but ColorTwist only accepts images.

Would the answer to my second question just be a Reshape -> ImageOp -> Reshape ?

Hi, Regarding 1) it should be doable. I would check the following files to implement this:

video_loader.h - where all available sequences are discovered based on available videos
video_loader.cc - where files are opened and auxiliary metadata is created or it is read from the cache

Regarding 2) you can check this post for more details.

NVIDIA / DALI

Errors when reading .webm and converted mp4 files #2737