NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

Apache License 2.0

5.07k stars 615 forks source link

VideoReader blocking while loading videos #1599

Closed CrohnEngineer closed 4 years ago

CrohnEngineer commented 4 years ago

Hi everybody,

I'm opening an issue since I am encountering several problems writing a Pipeline for loading video files. I'm not sure of wheter DALI is the best tool for my task and neither if I am using it properly, so I would start by first explaining my goal. I have a very huge dataset consisting of hundreds of thousands of videos, and I would like to use DALI's VideoReader to build a PyTorch DataLoader since, according to the documentation, DALI's VideoReader uses NVIDIA GPU’s hardware-accelerated video decoding, so I would like to speed-up and eventually parallelize the training of a CNN using the GPU for the data loading operations. I took the Video Super Resolution example (https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/video/superres_pytorch/README.html) and wrote my personal DALILoader as it follows:

"""
Dataset class for wrapping the DALI Pipeline in PyTorch
"""

import sys
import copy
from glob import glob
import math
import os

import torch
from torch.utils.data import DataLoader

from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin import pytorch
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import datetime

class VideoReaderPipeline(Pipeline):
    """
    DALI Pipeline for opening a video, normalizing it and randomly crop it
    """
    def __init__(self, batch_size, sequence_length, num_threads, device_id, files, crop_size, shuffle=False,
                 isGray=False):
        super(VideoReaderPipeline, self).__init__(batch_size, num_threads, device_id, seed=12)
        if isGray:
            self.num_channels = 1
        else:
            self.num_channels = 3
        # Video reader
        self.reader = ops.VideoReader(device="gpu", file_list=files, sequence_length=sequence_length, normalized=False,
                                      random_shuffle=shuffle, image_type=types.RGB, dtype=types.UINT8, initial_fill=16,
                                      channels=self.num_channels)
        # CropMirrorNormalize allows for cropping, mirroring, normalizing and finally transposing the output tensor
        # (defalut is CHW, so we don't insert an explicit transpose operation in the pipeline)
        self.crop = ops.CropMirrorNormalize(device="gpu", crop=crop_size, mean=[127.0],
                                            std=[127.0], mirror=0, output_dtype=types.FLOAT)
        # Random number generator for specifying the cropping position (for now crop each frame singularly without
        # looking into the temporal dimension)
        self.uniform = ops.Uniform(range=(0.0, 1.0))
        self.uniform1 = ops.Uniform(range=(0.0, 0.0))

    def define_graph(self):
        input = self.reader(name="Reader")
        output = self.crop(input[0], crop_pos_z=self.uniform1(), crop_pos_x=self.uniform(), crop_pos_y=self.uniform())
        return output, input[1]

class DALILoader():
    def __init__(self, batch_size, file_list, sequence_length, crop_size, device):
        self.pipeline = VideoReaderPipeline(batch_size=batch_size,
                                            sequence_length=sequence_length,
                                            num_threads=2,
                                            device_id=device,
                                            files=file_list,
                                            crop_size=crop_size)
        self.pipeline.build()
        self.epoch_size = self.pipeline.epoch_size("Reader")
        self.dali_iterator = pytorch.DALIGenericIterator(self.pipeline,
                                                         ["file", "label"],
                                                         self.epoch_size,
                                                         auto_reset=True)
    def __len__(self):
        return int(self.epoch_size)
    def __iter__(self):
        return self.dali_iterator.__iter__()

I created a file_list.csv where I have written all the paths and the labels of the videos (my task is a simple binary classification), and this simple test script:

if __name__=='__main__':
    print('Starting test...')
    batch_size = 1
    seq_length = 100
    file_list = 'path/to/file_list.csv'
    loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
    print('Loading videos at {}...'.format(datetime.datetime.now()))
    iterator = loader.__iter__()
    while iterator:
        item = iterator.__next__()
        for label in item[0]["label"]:
            print('Video is positive!') if label == 1 else print('Video is negative!')
    print('Videos loaded at {}'.format(datetime.datetime.now()))
    print('Finishing test!')

I want simply to load 100 frames of each video, then crop them randomly in the height and width dimension. As a first test, I didn't want to use the whole dataset, so I used just a portion of it (we are talking about 4000/5000 videos in any case), but when I run the code I have encountered three major errors. I report them in "discovery order", as after I have encountered the first one I have simplified my code reducing the task complexity too for doing a little debugging. I have DALI 0.16.0 installed, running the code on an Ubuntu machine with an E5-2630 CPU, 128GB of RAM and a single NVIDIA Quadro P6000 GPU.

The first error appears by simply running the script above as it is:

Traceback (most recent call last):
  File "DALILoader.py", line 76, in <module>
    loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
  File "DALILoader.py", line 59, in __init__
    self.pipeline.build()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 308, in build
    self._pipe.Build(self._names_and_devices)
RuntimeError: [/opt/dali/dali/operators/reader/loader/video_loader.cc:190] Could not open file /nas/public/dataset/1848521_1441897_A_000.mp4 because of Too many open files
Stacktrace (32 entries):
[frame 0]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x1434ae) [0x7f03e72794ae]
[frame 1]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x40c8db) [0x7f03e75428db]
[frame 2]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x412c72) [0x7f03e7548c72]
[frame 3]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x43730f) [0x7f03e756d30f]
[frame 4]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x438202) [0x7f03e756e202]
[frame 5]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(std::_Function_handler<std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (dali::OpSpec const&), std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (*)(dali::OpSpec const&)>::_M_invoke(std::_Any_data const&, dali::OpSpec const&)+0xc) [0x7f03e727476c]
[frame 6]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x131284) [0x7f03e5cd9284]
[frame 7]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::InstantiateOperator(dali::OpSpec const&)+0x34e) [0x7f03e5cd87ce]
[frame 8]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::OpGraph::InstantiateOperators()+0xa7) [0x7f03e5c91267]
[frame 9]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Build(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >)+0xad8) [0x7f03e5cf7858]
[frame 10]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x3758f) [0x7f03ed52e58f]
[frame 11]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/backend_impl.cpython-37m-x86_64-linux-gnu.so(+0x1fe03) [0x7f03ed516e03]
[frame 12]: python3(_PyMethodDef_RawFastCallKeywords+0x264) [0x55d62a49e6e4]
[frame 13]: python3(_PyCFunction_FastCallKeywords+0x21) [0x55d62a49e801]
[frame 14]: python3(_PyEval_EvalFrameDefault+0x537e) [0x55d62a4fa7ae]
[frame 15]: python3(_PyFunction_FastCallKeywords+0xfb) [0x55d62a49d79b]
[frame 16]: python3(_PyEval_EvalFrameDefault+0x6a0) [0x55d62a4f5ad0]
[frame 17]: python3(_PyFunction_FastCallDict+0x10b) [0x55d62a43c50b]
[frame 18]: python3(_PyObject_Call_Prepend+0xde) [0x55d62a453cbe]
[frame 19]: python3(+0x1710aa) [0x55d62a4960aa]
[frame 20]: python3(_PyObject_FastCallKeywords+0x128) [0x55d62a49e9b8]
[frame 21]: python3(_PyEval_EvalFrameDefault+0x4bf6) [0x55d62a4fa026]
[frame 22]: python3(_PyEval_EvalCodeWithName+0x2f9) [0x55d62a43b4f9]
[frame 23]: python3(PyEval_EvalCodeEx+0x44) [0x55d62a43c3c4]
[frame 24]: python3(PyEval_EvalCode+0x1c) [0x55d62a43c3ec]
[frame 25]: python3(+0x22f874) [0x55d62a554874]
[frame 26]: python3(PyRun_FileExFlags+0xa1) [0x55d62a55eb81]
[frame 27]: python3(PyRun_SimpleFileExFlags+0x1c3) [0x55d62a55ed73]
[frame 28]: python3(+0x23ae5f) [0x55d62a55fe5f]
[frame 29]: python3(_Py_UnixMain+0x3c) [0x55d62a55ff7c]
[frame 30]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f044837fb97]
[frame 31]: python3(+0x1e0122) [0x55d62a505122]

My first question therefore is:

Is there a limit for the number of videos to be opened by a VideoReader? Obviously 100 frames of 4000 videos cannot fit on a GPU memory, but I have imagined that each video would be loaded singularly only at the next() call of the DALIGenericIterator, so the frames would be loaded only when needed. Am I wrong? Moreover, for taking 100 frames of each video, is it right to have batch_size=1 and seq_length=100?

As a second experiment, I reduced the number of videos to 100. This time it seems that DALI is able to load the videos, but I got another error instead:

Traceback (most recent call last):
  File "DALILoader.py", line 76, in <module>
    loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
  File "DALILoader.py", line 64, in __init__
    auto_reset=True)
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 147, in __init__
    self._first_batch = self.next()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 244, in next
    return self.__next__()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 162, in __next__
    outputs.append(p.share_outputs())
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 399, in share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/fused/crop_mirror_normalize.h:155] Assert on "output_layout_.is_permutation_of(input_layout_)" failed: The requested output layout is not a permutation of input layout.
Stacktrace (11 entries):
[frame 0]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x1434ae) [0x7fed0a7c34ae]
[frame 1]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x79aa25) [0x7fed0ae1aa25]
[frame 2]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x760f00) [0x7fed0ade0f00]
[frame 3]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x3dcead) [0x7fed0aa5cead]
[frame 4]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0xc3c6d) [0x7fed091b5c6d]
[frame 5]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0xc4637) [0x7fed091b6637]
[frame 6]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x960e3) [0x7fed091880e3]
[frame 7]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x1139c6) [0x7fed092059c6]
[frame 8]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x6f6c90) [0x7fed097e8c90]
[frame 9]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fed6bca06db]
[frame 10]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fed6b9c988f]

Current pipeline object is no longer valid.

I am probably using the crop operation wrong, so

Is it right to have the CropMirrorNormalize working on the input[0] element? I expect that element to be the 100 frames batch tensor, with the input[1] element being the label instead. Am I guessing right? Is something wrong in my code or in the way I am using the CropMirrorNormalize operation?

Finally, as a last experiment I have removed the CropMirrorNormalize operation and built the pipeline using the VideoReader only. This time the code runs with no error, but it seems to "stop" after loading 3 videos only. The terminal stayed "freezed" for several minutes, and I had to kill the process. So, I am wondering

Do you have any guess for this behaviour?

I hope that my post is comprehensible and I apologize in advance for asking maybe too many non-related questions altogether, but I could not find any answer in the docs or in other issues here on GitHub.

Thank you in advance!

JanuszL commented 4 years ago

Hi, Thank you for all questions. Going one by one:

Is there a limit for the number of videos to be opened by a VideoReader? Obviously 100 frames of 4000 videos cannot fit on a GPU memory, but I have imagined that each video would be loaded singularly only at the next() call of the DALIGenericIterator, so the frames would be loaded only when needed. Am I wrong?

DALI keeps all video files open. There is an OS limit how many of them can be open simultaneously - you can increase it as in this answer https://github.com/NVIDIA/DALI/issues/1350#issuecomment-539521606 @a-sansanwal - I think we need to limit the number of the video files open at once in DALI and close files above that limit

Moreover, for taking 100 frames of each video, is it right to have batch_size=1 and seq_length=100

That is correct.

RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/fused/crop_mirror_normalize.h:155] Assert on "output_layout_.is_permutation_of(input_layout_)" failed: The requested output layout is not a permutation of input layout. Is it right to have the CropMirrorNormalize working on the input[0] element? I expect that element to be the 100 frames batch tensor, with the input[1] element being the label instead. Am I guessing right? Is something wrong in my code or in the way I am using the CropMirrorNormalize operation?

It should not be any problem running it on the input[0]. @jantonguirao can you check why "output_layout_.is_permutation_of(input_layout_ the error appears (I'm also not sure if we can crop over z-axis in this case).

his time the code runs with no error, but it seems to "stop" after loading 3 videos only. The terminal stayed "freezed" for several minutes, and I had to kill the process. So, I am wondering

DALI doesn't support videos with VFR (variable frame rate) and the user may experience this kind of hangs. DALI has some heuristics to detect this kind of input and warn the user but it is not 100% accurate. You may want to update DALI to some nightly build and check if this is still the case (maybe there some other issues we have fixed already). Also if you can narrow down this problem to a particular video and share it with use we can verify what is the root cause.

I hope that my post is comprehensible and I apologize in advance for asking maybe too many non-related questions altogether, but I could not find any answer in the docs or in other issues here on GitHub.

We are happy to help.

CrohnEngineer commented 4 years ago

Hi @JanuszL ,

thank you for you fast reply!

DALI keeps all video files open. There is an OS limit how many of them can be open simultaneously - you can increase it as in this answer #1350 (comment)

Unfortunately I don't have administrator privileges on this machine, but I have contacted the administrator and I will check if the code runs when I'll have the limit of open files increased for my user.

You may want to update DALI to some nightly build and check if this is still the case (maybe there some other issues we have fixed already). Also if you can narrow down this problem to a particular video and share it with use we can verify what is the root cause.

Regarding this, I have installed the 0.18.0.dev20191220 nightly release, and building the Pipeline both with or without the CropMirrorNormalize operation gives the following error

Starting test...
[/opt/dali/dali/operators/reader/loader/video_loader.cc:280] File /nas/public/dataset/1260311_1976794_B_001.mp4 does not have the same resolution as previous files. (720x1280 instead of 1080x1920). Install Nvidia driver version >=396 (x86) or >=415 (Power PC) to decode multiple resolutions
[/opt/dali/dali/operators/reader/video_reader_op.h:67] Decoder reconfigure feature not supported
Traceback (most recent call last):
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 76, in <module>
    loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 59, in __init__
    self.pipeline.build()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 316, in build
    self._pipe.Build(self._names_and_devices)
RuntimeError: Decoder reconfigure feature not supported

Might be that the different resolution of this video is causing both the problem with the CropMirrorNormalize operation and the hang of the test script using the stable DALI release? It is strange though, because the driver of my GPU is the 430.26 version...

Thank you again for your help and time :)

JanuszL commented 4 years ago

I believe this is a problem that is being fixed in https://github.com/NVIDIA/DALI/pull/1591. It is about the decoder, not about CropMirrorNormalize. I recommend testing again with the nightly build when https://github.com/NVIDIA/DALI/pull/1591 is merged.

jantonguirao commented 4 years ago

Regarding the issues with CropMirrorNormalize. There are two problems. A bit of history, CropMirrorNormalize was initially designed to work with 2D images only (width and height) and later was extended to work with volumetric (3D) images (width, height, and depth). Volumetric images are treated differently to sequences of frames (that is, video). Those have a layout with width, height, and a number of frames. We specify layouts with a string like "HWC", "CHW", "FHWC", etc. Here are some examples:

HWC: Image layout with interleaved channels
CHW: Image layout with separate channel planes
FHWC: A sequence layout (video) with interleaved channels
FCHW: A sequence layout (video) with separate channel planes
DHWC: A volumetric image layout (3D image) with interleaved channels.

There are two issues here:

The default output layout is "CHW". That is an image layout but your input layout is probably "FHWC". That is why you get the error saying that the output layout is not a permutation of the input layout. To make the layout conversion to a planar configuration, you could set the output layout to "FCHW" instead.
We never took into consideration using CropMirrorNormalize to modify the sequence dimension "F" so the API only allows you to specify a crop window in 3 dimension: H (height), W (width) and D (depth).

There is another operator called Slice that allows getting a slice on any dimension but its usage might require a little bit more code and you'll have to do normalization as a separate step.

I think that your use case could be accommodated into CropMirrorNormalize by allowing to do volumetric crop on sequences as well (treating the sequence dimension as depth internally). We will take a look and come back to you with a solution.

CrohnEngineer commented 4 years ago

Hey @jantonguirao and @JanuszL ,

thanks again for your replies!

There are two issues here: The default output layout is "CHW". That is an image layout but your input layout is probably "FHWC". That is why you get the error saying that the output layout is not a permutation of the input layout. To make the layout conversion to a planar configuration, you could set the output layout to "FCHW" instead.

By reinstalling the master release and specificying output_layout="FCHW" the error on the output layout not being a permutation of the input's disappears. I have taken for granted the temporal dimension (number of frames) in the layout specification, my bad I'm sorry guys! There is still the hang on the loading of the videos, but as @JanuszL suggested here

I believe this is a problem that is being fixed in #1591. It is about the decoder, not about CropMirrorNormalize. I recommend testing again with the nightly build when #1591 is merged.

I will wait until #1591 is merged and run the test again.

For what concerns this

We never took into consideration using CropMirrorNormalize to modify the sequence dimension "F" so the API only allows you to specify a crop window in 3 dimension: H (height), W (width) and D (depth).

from the docs I didn't get that DALI considers differently volumetric inputs (heght width and depth) and sequences of frames (number of frames, height and width), but it might be that I have been biased by the fact that sometimes I work considering video files as 3D volumes, so I am used to treat them in that way. However,

I think that your use case could be accommodated into CropMirrorNormalize by allowing to do volumetric crop on sequences as well (treating the sequence dimension as depth internally). We will take a look and come back to you with a solution.

maybe the step and stride options of the VideoReader are more intuitive and direct for "cropping" video files in the temporal dimension? For instance, if you look at my code, the crop_z parameter is always set to 0: I have only used the ops.Uniform(range=(0.0, 0.0)) to have an _EdgeReference always equal to 0, since as I have understood float values are not accepted for specifying the crop positions for the Crop and CropMirrorNormalize operations? Anyway, I just wanted to be sure that no cropping happened in the temporal dimension, but if I wanted the contrary I would have probably used the VideoReader directly.

In any case, If I can give my personal opinion on the matter, extending the Crop and CropMirrorNormalize operations in the temporal dimensions of videos could be a great feature for the library (if that is something possible for the internal implementation of DALI of course)! Many users would appreciate it :)

Sorry for bothering you, I just wanted to give my two cents! Thanks again for your time and patience!

JanuszL commented 4 years ago

Anyway, I just wanted to be sure that no cropping happened in the temporal dimension, but if I wanted the contrary I would have probably used the VideoReader directly.

If no parameter is provided then no cropping will happen across the z-axis (for 3D data).

Sorry for bothering you, I just wanted to give my two cents! Thanks again for your time and patience!

This is very valuable feedback. Thank you very much for it.

jantonguirao commented 4 years ago

Thanks @CrohnEngineer for the very valuable feedback.

I apologize if the documentation of Crop / CropMirrorNormalize was not intuitive in that regard. We are constantly revisiting and updating the documentation and any suggestions on things to improve are very much welcome.

You are right, using the Video reader arguments to extract the relevant part of the video would be preferred. Doing that will save you the time for decoding the frames that you are not interested in.

1605 Enables will enable you to specify crop_z and crop_d to treat the temporal dimension of a sequence as depth.

As Janusz said, if you don't want to crop on the depth dimension you simply don't provide those arguments and the cropping will happen only on the height and width dimensions.

CrohnEngineer commented 4 years ago

Hey @JanuszL and @jantonguirao ,

I'm glad my feedback helped you in any way, and thank you again for your replies and suggestions :) I just wanted to give you a quick update on my errors.

I believe this is a problem that is being fixed in #1591. It is about the decoder, not about CropMirrorNormalize. I recommend testing again with the nightly build when #1591 is merged.

I have seen #1591 has been merged, but I couldn't test my code until the new nightly release came out today. I have increased the number of open files allowed by the OS, and fixed the CropMirrorNormalize operation as suggested by @jantonguirao (with the crop_z dimension not provided as both of you suggested me). Still, my code hangs after opening 3 videos only, indifferently if opening 100 or 4000 videos. Using the stable release I get the error related to the decoder

Starting test...
[/opt/dali/dali/operators/reader/loader/video_loader.cc:247] File /nas/public/dataset/1260311_1976794_B_001.mp4 does not have the same resolution as previous files. (720x1280 instead of 1080x1920). Install Nvidia driver version >=396 (x86) or >=415 (Power PC) to decode multiple resolutions
[/opt/dali/dali/operators/reader/video_reader_op.h:58] Decoder reconfigure feature not supported
Traceback (most recent call last):
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 76, in <module>
    loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 59, in __init__
    self.pipeline.build()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 308, in build
    self._pipe.Build(self._names_and_devices)
RuntimeError: Decoder reconfigure feature not `supported

this when opening videos from the 100 files list,

Starting test...
[/opt/dali/dali/operators/reader/loader/video_loader.cc:247] File /nas/public/dataset/2090100_2005778_A_002.mp4 does not have the same resolution as previous files. (720x1280 instead of 1920x1080). Install Nvidia driver version >=396 (x86) or >=415 (Power PC) to decode multiple resolutions
[/opt/dali/dali/operators/reader/video_reader_op.h:58] Decoder reconfigure feature not supported
Traceback (most recent call last):
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 75, in <module>
    loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 58, in __init__
    self.pipeline.build()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 308, in build
    self._pipe.Build(self._names_and_devices)
RuntimeError: Decoder reconfigure feature not supported

and this opening videos from the 4000 files list (just to show you that the error is not dependent on a single video file). Shouldn't this bug be fixed by #1591 ? Or maybe this feature has not been build yet in the nightly release? Am I missing something? Thanks again for your support and patience!

Edit: happy new year :) 🎉🎉🎉

JanuszL commented 4 years ago

@CrohnEngineer ,

Using the stable release I get the error related to the decoder ...

This is expected as #1591 should fix RuntimeError: Decoder reconfigure feature not supported problem. However, the hang you see is a different problem. I suspect it may be the problem of vfr (variable frame rate) video which DALI doesn't support. If you can narrow down the problem to one video with the nightly build and share this video we can check what is the exact cause.

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

However, the hang you see is a different problem. I suspect it may be the problem of vfr (variable frame rate) video which DALI doesn't support. If you can narrow down the problem to one video with the nightly build and share this video we can check what is the exact cause.

I am trying to narrow down one of the videos that may cause the problem, but I am encountering some difficulties. Let me explain: one of the first question I asked you was

Moreover, for taking 100 frames of each video, is it right to have batch_size=1 and seq_length=100? That is correct.

As the docs suggest, I have created a file_list.csv with the path of each video followed by the label associated to it. For instance, the first 5 entries of the file where I have inserted the paths of the 4000 videos dataset are

/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1224068/1224068_H_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/700790/700790_D_001.mp4 0

Using this file list, and the setting indicated above (batch_size=1 and seq_length=100), as I asked you I have imagined that each batch would contain only one element constituted by 100 frames of each video.

Therefore, the elements returned by the DALIGenericIterator would be:

the first element 100 frames of video /nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4, and a label=1;
the second element 100 frames from video /nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4, and again a label=1;
the third element 100 frames from video /nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4, with a label=0;
and so on.

Instead, when I run the test code posted in the first comment, the iterator (which I have found, using a debugger, is the element "hanging" in the code), before it hangs, returns the first three elements as positive (so label=1), as you can see in the picture

I have modified the order of the elements in the list, so that the first is a negative video followed by two positives

/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1224068/1224068_H_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/700790/700790_D_001.mp4 0

However, running the test code again the iterator hangs after three videos, returning for the first two elements label=0 (as can you see again in the picture below)

Shouldn't be the sequence of labels in this case be 0 1 1 (negative positive positive)? Could you please explain me what is happening here? Does this mean that the iterator is picking two batches sequentially from the same video? I'm sorry for asking (yet) another question, but without being sure how the element is created I cannot point exactly to no video.

However, if it can help, I have checked using ffmpeg the frame rate of the first 5 entries of the list, together with if they had VFR, and the results are:

/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4, label=0, frame_rate=15 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4, label=1, frame_rate=15 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4, label=1, frame_rate=29.977946 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1224068/1224068_H_003.mp4,  label=0, frame_rate=30 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/700790/700790_D_001.mp4, label=0, frame_rate=29.970030 FPS, no VFR

As you can see, the single frame rate can be different from video to video, but no VFR is employed (at least in the videos I have opened so far, and for what the creators of the dataset have disclosed publicly).

Thanks again for your time and support, I hope I have explained everything clearly!

JanuszL commented 4 years ago

@a-sansanwal - could you look into that problem?

a-sansanwal commented 4 years ago

Hi @CrohnEngineer , the hang issue which you mentioned seems very likely related to https://github.com/NVIDIA/DALI/pull/1592. It was merged yesterday so it should be in nightly soon.

Also regarding

Shouldn't be the sequence of labels in this case be 0 1 1 (negative positive positive)? Could you please explain me what is happening here?

That depends, if the first video has say 200 frames. Then you will get 0-99 frames from first video and then 100-199 frames from the first video and then we move on to the second video.

If you want to choose only 0-99 frames from first video, you can do something like the following in your file_list.txt while also setting file_list_frame_num=True in VideoReader.

file.mp4 0 0 100
file1.mp4 1 0 100
file2.mp4 2 0 100

Using a file_list.txt similar to that will allow you to choose specific frames from a video. You can see the example here https://github.com/NVIDIA/DALI/pull/1612/files which demonstrates this.

Please feel free to ask any more questions. Hope this answered your query.

CrohnEngineer commented 4 years ago

Hey @a-sansanwal and @JanuszL ,

thank you for your timely replies!

If you want to choose only 0-99 frames from first video, you can do something like the following in your file_list.txt while also setting file_list_frame_num=True in VideoReader.

file.mp4` 0 0 100
file1.mp4 1 0 100
file2.mp4 2 0 100

I have modified my file list in order to have the start and end frames' numbers indicated as @a-sansanwal suggested, and inserted the file_list_frame_num=True options in the VideoReader.

Anyway, using a debugger for checking the code execution, I have noticed that while before the code would hang while the iterator was returning the elements, this time it idles while returning the iterator itself. In poor words, before the code would hang here

while` iterator:
        item = iterator.__next__()
        for label in item[0]["label"]:
            print('Video is positive!') if label == 1 else print('Video is negative!')

inside the while cycle. Now, the code hangs just after submitting the instruction for creating the DALIGenericIterator

self.dali_iterator` = pytorch.DALIGenericIterator(self.pipeline,
                                                         ["file", "label"],
                                                         self.epoch_size,
                                                         auto_reset=True)

Do you have any guess for this behaviour? Anyway,

Hi @CrohnEngineer , the hang issue which you mentioned seems very likely related to #1592. It was merged yesterday so it should be in nightly soon.

I will test the code again with the next nightly release.

Thank you again for your time and responses!

JanuszL commented 4 years ago

pytorch.DALIGenericIterator prefetches the first batch - https://github.com/NVIDIA/DALI/blob/master/dali/python/nvidia/dali/plugin/pytorch.py#L148. So in this hangs happen when the first batch is computed.

CrohnEngineer commented 4 years ago

Hey @JanuszL and @a-sansanwal ,

Hi @CrohnEngineer , the hang issue which you mentioned seems very likely related to #1592. It was merged yesterday so it should be in nightly soon.

I have installed the last nightly release, and finally my script runs without hangs! Thank you very much for your support, help and answers during these weeks, they have been priceless!

Anyway, I'm sorry to bother you again, but I have another question regarding the CropMirrorNormalize operation (I don't know if maybe it is better to ask @jantonguirao directly). While my code now seems to run without hangs, at a certain point this error pops up

Starting test...
Loading videos at 2020-01-08 15:32:47.003687...
Video 0 is negative!
Video 1 is positive!
Video 2 is negative!
Video 3 is positive!
Video 4 is positive!
Video 5 is positive!
Video 6 is positive!
Video 7 is positive!
Video 8 is negative!
Video 9 is positive!
Video 10 is positive!
Video 11 is positive!
Video 12 is positive!
Video 13 is positive!
Video 14 is positive!
Video 15 is positive!
Video 16 is negative!
Video 17 is positive!
Video 18 is negative!
Video 19 is negative!
Video 20 is positive!
Video 21 is positive!
Video 22 is positive!
Video 23 is positive!
Video 24 is positive!
Video 25 is positive!
Video 26 is positive!
Video 27 is positive!
Video 28 is negative!
Video 29 is positive!
Video 30 is positive!
Video 31 is negative!
Video 32 is positive!
Video 33 is negative!
Video 34 is positive!
Video 35 is positive!
Video 36 is positive!
Video 37 is positive!
Video 38 is positive!
Traceback (most recent call last):
  File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 85, in <module>
    item = iterator.__next__()
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 163, in __next__
    outputs.append(p.share_outputs())
  File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 409, in share_outputs
    return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/crop/crop_attr.h:154] Assert on "crop_shape[dim] > 0 && crop_shape[dim] <= input_shape[dim]" failed: Crop shape for dimension 1 (256) is out of range [0, 240]
Stacktrace (15 entries):
[frame 0]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2b56fe) [0x7f3a468c86fe]
[frame 1]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x35f898) [0x7f3a46972898]
[frame 2]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x362401) [0x7f3a46975401]
[frame 3]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x3637d9) [0x7f3a469767d9]
[frame 4]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0xf54923) [0x7f3a47567923]
[frame 5]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0xf575f7) [0x7f3a4756a5f7]
[frame 6]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0xf1c190) [0x7f3a4752f190]
[frame 7]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x7c9a4d) [0x7f3a46ddca4d]
[frame 8]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0xc32bd) [0x7f3a452a62bd]
[frame 9]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0xc3c11) [0x7f3a452a6c11]
[frame 10]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x957c3) [0x7f3a452787c3]
[frame 11]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x112856) [0x7f3a452f5856]
[frame 12]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x7308b0) [0x7f3a459138b0]
[frame 13]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f3aac59d6db]
[frame 14]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f3aac2c688f]

Current pipeline object is no longer valid.

Process finished with exit code 1

Do you have any guess on what is happening? I have checked the video causing the problem and seems to be coherent with the other ones in the list. I'm not sure however if I'm using the CropMirrorNormalize operation correctly. Just to recall my code in the first comment, I define the graph as

    def define_graph(self):
        input = self.reader(name="Reader")
        output = self.crop(input[0], crop_pos_x=self.uniform(), crop_pos_y=self.uniform())
        return output, input[1]

input returned by the VideoReader is a list of two elements, and I thought that input[0] would represent the video frames tensor while input[1] the label associated to them. So, I have inserted the crop operation on input[0] only, but I'm not sure if that's right. However, if I write output = self.crop(input, crop_pos_x=self.uniform(), crop_pos_y=self.uniform()), I receive the following error

TypeError: Expected outputs of type compatible with "EdgeReference". Received output type with name "list" that does not match.

Could you explain me why the VideoReader returns a list? What do the elements of the list represent? It is right to call the CropMirrorNormalize operation on just one element like input[0]?

Thank you again for your time and feedback!

JanuszL commented 4 years ago

@CrohnEngineer - VideoReader returns frames and labels, also it can return frame numbers and time stamps. So input[0] has frames, while input[1] has labels and so on.

Regarding the error. Does the video has the same resolution as the other one? The error sounds like you want to crop to 256 in one dimension while your video has 240 at most. For images usually resize operation is conducted first to make sure that input to the crop has a certain size, however, resize doesn't support video sequences yet. For now, could you try to remove the videos with a lower resolution than your crop argument?

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

@CrohnEngineer - VideoReader returns frames and labels, also it can return frame numbers and time stamps. So input[0] has frames, while input[1] has labels and so on.

I thought it was something like this, thanks for the clarification! Regarding the video

Does the video has the same resolution as the other one? The error sounds like you want to crop to 256 in one dimension while your video has 240 at most.

You were right, the video causing the problem had a resolution of 320x240 pixels: I have reduced the crop dimension to 240x240 and it finally runs fine. Thank you really really much for your help!!!

If you don't mind, I would ask a very last question before closing the issue. As I said now the code runs, but at video number 67 it seems to fail to allocate the memory for the GPU

Starting test... Loading videos at 2020-01-09 13:05:50.638481... Video 0 is negative! Video 1 is positive! Video 2 is negative! Video 3 is positive! ... ... Video 53 is positive! Video 54 is positive! Video 55 is positive! Video 56 is negative! Video 57 is positive! Video 58 is negative! Video 59 is negative! Video 60 is positive! Video 61 is positive! Video 62 is positive! Video 63 is negative! Video 64 is positive! Video 65 is positive! Video 66 is positive! Traceback (most recent call last): File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 86, in item = iterator.next() File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 163, in next outputs.append(p.share_outputs()) File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 409, in share_outputs return self._pipe.ShareOutputs() RuntimeError: Critical error in pipeline: CUDA allocation failed Current pipeline object is no longer valid.

Is this a memory issue (I'm trying to allocate too much frames on the GPU) or is something else? I am currently running the code on a Nvidia Titan V with 12 GB of memory. From the start of the execution, around 8 GB of memory are allocated, which grows towards 8.9 GB and then the execution stops.

Thank you again for your help and support!

JanuszL commented 4 years ago

You can try to play with additional_decode_surfaces and initial_fill parameters of the VideoReader. Also, you can play with prefetch_queue_depth pipeline argument.

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

I have tried setting additional_decode_surfaces to 0, but nothing changed. Moreover, I have set shuffle=False so initial_fill shouldn't be considered. In any case, I have tried also setting initial_fill=0 and prefetch_queue_depth=1, but with no success. Anyway, I will close the issue since my original problem has been solved! Maybe I can open another one if the problem persists and is of interest for you too?

Thank you again for your help, and also @jantonguirao and @a-sansanwal ! Your support has been extremely helpful and irreplaceable! :)

JanuszL commented 4 years ago

@CrohnEngineer - if you have some video samples you could share to show this memory grow we can check this on our side. Anything but decoding video with the bigger resolution doesn't come to my mind as a reason why memory consumption keeps growing when you run the pipeline.

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

the dataset is made by 4000 videos, I can't immediately share it with you. Anyway, it is the preliminary dataset of the Facebook Deepfake Detection Challenge, of which I took part of it and made a "train" directory. If you have access to this dataset, you can simply create a file_list.csv and check if it is really a memory issue or something else.

Anyway, maybe it is not important, but the strange thing to me is that the memory is not completely saturated when the error takes place: the TITAN V has 12GB of memory, but the CUDA allocation failed error happens when 9GB of it are occupied. Does it look strange to you too?

JanuszL commented 4 years ago

Ok, if it is https://www.kaggle.com/c/deepfake-detection-challenge/data then I can access it. I will try to repro your problem.

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

be careful, the one on Kaggle I think is the complete dataset of 120000 videos. Moreover, it is a little complicated in its organization (it is divided in multiple folders). For this reason I am using the one hosted on https://deepfakedetectionchallenge.ai/, because I wanted to get used to DALI before moving to a very large dataset. However, I think the preliminary dataset is no longer available. If you have enough resources and time to check it on the complete dataset, that would be awesome in any case. I think the characteristics of the videos in the two datasets are pretty similar (in terms of resolution, FR, VFR, etc...).

JanuszL commented 4 years ago

@CrohnEngineer - the one I see consists of 400 videos so it is fine. I think I see where the problem is. DALI reader uses a prefetch buffer. It has 2 batch_size prefetch_depth size. For Full HD image (~23MB when returned as a float) it makes > 40MB for a sequence of length 1. As DALI works as a pipeline VideoReder output needs to keep own buffer for the whole sequence. So in your case consumed memory is 2 batch_size prefetch_depth + batch_size. So for the sequence length of 100 it makes 23 3 100 ~= 7GB. With the code you provided I'm able to almost fully saturate the memory to 12GB (my GPU has also 12GB). The most promising optimization that comes to my mind is to fuse (reenable) resize inside the VideoReader so each output frame is not that heavy. @a-sansanwal - what do you think?

JanuszL commented 4 years ago

@CrohnEngineer - I see one incomplete implementation in DALI. Even you ask the VideoReader for dtype=types.UINT8 it internally allocates memory for float32 data. I will fix that soon, it should reduce memory occupation 4 times (I hope).

JanuszL commented 4 years ago

https://github.com/NVIDIA/DALI/pull/1643 should reduce memory consumption

a-sansanwal commented 4 years ago

@JanuszL Thats a good idea, we could add support for argument like resize_x, resize_y in VideoReader and do resize in VideoReader itself. I will add it to my to-do list. We have a scale argument but its implementation is not there.

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

sorry for the delay in answering you!

@CrohnEngineer - I see one incomplete implementation in DALI. Even you ask the VideoReader for dtype=types.UINT8 it internally allocates memory for float32 data. I will fix that soon, it should reduce memory occupation 4 times (I hope).

1643 should reduce memory consumption

That's good to hear! I cant' wait for trying it out :) Regarding this

I think I see where the problem is. DALI reader uses a prefetch buffer. It has 2 batch_size prefetch_depth size. For Full HD image (~23MB when returned as a float) it makes > 40MB for a sequence of length 1. As DALI works as a pipeline VideoReder output needs to keep own buffer for the whole sequence. So in your case consumed memory is 2 batch_size prefetch_depth + batch_size. So for the sequence length of 100 it makes 23 3 100 ~= 7GB.

I made your same computation, and so accounted for the 7GB of memory used by the GPU for the prefetch batches and the actual elements. Still, I don't get why the memory consumption increases? Does it mean that DALI keeps all the element fetched so far in the GPU? Maybe it's a dumb question, but I am a little confused on how DALI uses the GPU's memory, would you mind explain that to me briefly?

JanuszL commented 4 years ago

I made your same computation, and so accounted for the 7GB of memory used by the GPU for the prefetch batches and the actual elements. Still, I don't get why the memory consumption increases?

I was not able to reproduce this memory usage grown with the test code you have provided. In some case, it is possible as DALI uses a lazy approach to memory allocation (it enlarges allocation when needed, but doesn't free anything as any free or alloc for the GPU is very time-consuming) and when a bigger image at the output of some random operator (like random resize or just decoder) appear memory need to be additionally allocated. But in case of your pipeline the size of the images are the same so the watermark should be reached very soon and no additional allocation should happen (I don't see that in my case).

CrohnEngineer commented 4 years ago

I was not able to reproduce this memory usage grown with the test code you have provided. In some case, it is possible as DALI uses a lazy approach to memory allocation (it enlarges allocation when needed, but doesn't free anything as any free or alloc for the GPU is very time-consuming) and when a bigger image at the output of some random operator (like random resize or just decoder) appear memory need to be additionally allocated. But in case of your pipeline the size of the images are the same so the watermark should be reached very soon and no additional allocation should happen (I don't see that in my case).

Ok, thank you @JanuszL for the explanation! Do you any hint on where to search for finding the root of the problem?

JanuszL commented 4 years ago

I would start with running nvidia-smi -lms 100 --query-gpu=memory.used --format=csv in the console to see how memory utilization grows. As I said, in my case it watermark is 8636 MiB for deepfake-detection-challenge dataset, batch size 1, sequence size 100.

CrohnEngineer commented 4 years ago

Hey @JanuszL ,

I think I finally found the root of the problem.

when a bigger image at the output of some random operator (like random resize or just decoder) appear memory need to be additionally allocated.

You were right! As you were suggesting, I found out that some of the videos present a resolution greater than the 1920x1080 of the full HD! What happened here

As I said now the code runs, but at video number 67 it seems to fail to allocate the memory for the GPU

is that the successive video in the list (video number 68) has a resolution of 3840x2160 pixels; while prefetching the successive batch with DALI the GPU runs out of memory and therefore from here the CUDA allocation failed error pops out. Reducing the sequence_length allowed me to see the allocation of the biggest frames in the GPU and the "spike" in the memory consumption; until #1643 is merged, I will probably work with shorter sequences.

Speaking of this, I would like to use the stride argument of the VideoReader. If I have a video of, let's say, 10 (numbered) frames, like this [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], and I would like to have a sequence of sequence_length=5 and stride=2, this means that the resulting sequence will contain one frame every two right? Resulting in something like this [0, 2, 4, 6, 8]?

Thank you really really much for your help! This code had kept me busy for weeks now, without your assistance I could never make it work!

JanuszL commented 4 years ago

@CrohnEngineer,

If I have a video of, let's say, 10 (numbered) frames, like this [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], and I would like to have a sequence of sequence_length=5 and stride=2, this means that the resulting sequence will contain one frame every two right? Resulting in something like this [0, 2, 4, 6, 8]?

It should work exactly as you say. If you use nightly build you can enable enable_frame_num and get the actual frame number to verify if it works as you want.

JanuszL commented 4 years ago

@CrohnEngineer - it is merged and should be available in the next nightly build.

kavita19 commented 2 years ago

can anybody tell me how to solve this error

File "/home/knuvi/Desktop/Kavita/fastdvdnet-0.1/dataloaders.py", line 52, in init self.crop = CropMirrorNormalize(device="gpu", \ NameError: name 'CropMirrorNormalize' is not defined

JanuszL commented 2 years ago

Hi @kavita19,

Can you provide more details about the code you are trying to run? In my case:

import nvidia.dali
nvidia.dali.ops.CropMirrorNormalize()

Just works. Can you check it on your side?

kavita19 commented 2 years ago

Hello Thanks for your reply. I am trying to run Fastdvdnet model. Firstly I am getting error for Nvidia module name not found error then I install Nvidia-Dali as per requirement. Then I am trying to train model but I am getting error in dataloaders.py file.

Loading datasets ... Traceback (most recent call last): File "train_fastdvdnet.py", line 212, in main(vars(argspar)) File "train_fastdvdnet.py", line 38, in main temp_stride=3) File "/home/knuvi/Desktop/Kavita/fastdvdnet-0.1/dataloaders.py", line 100, in init step=temp_stride) File "/home/knuvi/Desktop/Kavita/fastdvdnet-0.1/dataloaders.py", line 53, in init** self.crop = CropMirrorNormalize(device="gpu", \ NameError: name 'CropMirrorNormalize' is not defined

this code (line)

# Define crop and permute operations to apply to every sequence
    self.crop = CropMirrorNormalize(device="gpu", \
                                    crop=crop_size, \
                                    output_layout=types.NCHW, \
                                    output_dtype=types.FLOAT)
    self.uniform = ops.Uniform(range=(0.0, 1.0)) # used for random crop

self.transpose = ops.Transpose(device="gpu", perm=[3, 0, 1, 2])

JanuszL commented 2 years ago

Hi @kavita19,

Please make sure that you have the most recent DALI version. The installation instruction can be found here. Also make sure you haven't changed anything in the fastdvdnet code. As I see the mentioned piece of code looks different in the official repository compared to what you provided:

        # Define crop and permute operations to apply to every sequence
        self.crop = ops.CropMirrorNormalize(device="gpu",
                                        crop_w=crop_size,
                                        crop_h=crop_size,
                                        output_layout='FCHW',
                                        dtype=types.DALIDataType.FLOAT)
        self.uniform = ops.Uniform(range=(0.0, 1.0))  # used for random crop

vs what you provided:

        # Define crop and permute operations to apply to every sequence
    self.crop = CropMirrorNormalize(device="gpu", \
                                    crop=crop_size, \
                                    output_layout=types.NCHW, \
                                    output_dtype=types.FLOAT)
    self.uniform = ops.Uniform(range=(0.0, 1.0)) # used for random crop

kavita19 commented 2 years ago

ok I update my code as per model. Can I install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==1.2.0 with cuda 11.4 (my pc) ? because I have issues with dali installation.

Loading datasets ... Traceback (most recent call last): File "train_fastdvdnet.py", line 214, in main(**vars(argspar)) File "train_fastdvdnet.py", line 40, in main temp_stride=3) File "/home/knuvi/Desktop/Kavita/fastdvdnet/dataloaders.py", line 99, in init step=temp_stride) TypeError: init() got an unexpected keyword argument 'sequence_length'

JanuszL commented 2 years ago

Hi @kavita19,

Can I install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==1.2.0 with cuda 11.4 (my pc) ? because I have issues with dali installation.

Yes, it should work. Make sure that you have the latest pip version installed pip install --upgrade pip.

File "/home/knuvi/Desktop/Kavita/fastdvdnet/dataloaders.py", line 99, in init step=temp_stride) TypeError: init() got an unexpected keyword argument 'sequence_length'

Seems like an error not related to DALI, class VideoReaderPipeline(Pipeline): is part of the fastdvdnet code. I would double-check if your source code is not corrupted.

kavita19 commented 2 years ago

Thanks. After solving this error. My training of fastdvdnet started but I am getting ZeroDivisionError: division by zero after first epoch.

[epoch 1][3981/4000] loss: 12.6643 PSNR_train: 0.0000 [epoch 1][3991/4000] loss: 13.8910 PSNR_train: 0.0000 Traceback (most recent call last): File "train_fastdvdnet.py", line 212, in main(vars(argspar)) File "train_fastdvdnet.py", line 147, in main trainimg=img_train File "/home/knuvi/Desktop/Kavita/fastdvdnet-0.1/train_common.py", line 130, in validate_and_log psnr_val /= len(dataset_val) ZeroDivisionError: division by zero (fastdvdnet2) **@.***:~/Desktop/Kavita/fastdvdnet-0.1$

On Tue, Oct 5, 2021 at 10:18 AM Janusz Lisiecki @.***> wrote:

Hi @kavita19 https://github.com/kavita19,

Can I install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==1.2.0 with cuda 11.4 (my pc) ? because I have issues with dali installation.

Yes, it should work. Make sure that you have the latest pip version installed pip install --upgrade pip.

File "/home/knuvi/Desktop/Kavita/fastdvdnet/dataloaders.py", line 99, in init step=temp_stride) TypeError: init() got an unexpected keyword argument 'sequence_length'

Seems like an error not related to DALI, class VideoReaderPipeline(Pipeline): is part of the fastdvdnet code. I would double-check if your source code is not corrupted.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/1599#issuecomment-934606566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENDB57YTYEWJI7ZISDI7HLUFMXNPANCNFSM4J57WS6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

JanuszL commented 2 years ago

Hi @kavita19,

I guess your validation dataset is empty (this is probably the only reason why len(dataset_val) is 0). Could you check it?

kavita19 commented 2 years ago

Hi @Janusz

I already gave the validation path and in the validation folder I kept my own image sequences as per github(FastDvdnet) reference. but it still gives the same error message [ZeroDivisionError].

On Wed, Oct 6, 2021 at 2:56 AM Janusz Lisiecki @.***> wrote:

Hi @kavita19 https://github.com/kavita19,

I guess your validation dataset is empty (this is probably the only reason why len(dataset_val) is 0). Could you check it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/1599#issuecomment-935878532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENDB54YGVRRNOMGXVB6B5DUFQMOLANCNFSM4J57WS6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

JanuszL commented 2 years ago

Hi @kavita19,

In such case, I would extract the code part that creates dataset_val and check it len manually. Also, there may be some errors/warnings you have missed. Or the fastdvdnet code doesn't work with the recent DALI version and you need to ask its author for help.

kavita19 commented 2 years ago

Ok thank you so much for your help. I will check this.

On Fri, Oct 15, 2021 at 12:35 AM Janusz Lisiecki @.***> wrote:

Hi @kavita19 https://github.com/kavita19,

In such case, I would extract the code part that creates dataset_val and check it len manually. Also, there may be some errors/warnings you have missed. Or the fastdvdnet code doesn't work with the recent DALI version and you need to ask its author for help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/1599#issuecomment-944068667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENDB54QBRHUXTCEJAX22ZLUG7KVLANCNFSM4J57WS6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

0Rutuja28-97 commented 1 year ago

`video_path = "demo_video_5sec.mp4"

fps = 2

video_dict = []

list1 = [] def video_reader(path, fps):

cap = cv2.VideoCapture(video_path)

cap.set(cv2.CAP_PROP_FPS, fps)
i = 0

while cap.isOpened():
    ret, frame = cap.read()
    if ret:

        #image = cv2.resize(frame)
        #cv2.imshow("image", frame)
        mesh_points = face_mesh(frame)
        #print(mesh_points)
        frame_dict = {}
        if mesh_points is not None:

            frame_dict[i]= mesh_points
            #video_dict.append(frame_dict)
            list1.append(mesh_points)
            i = i+1
        #if cv2.waitKey(25) & 0xff == ord('q'):
            #break
    else:
        break

cap.release()

video_reader(video_path,fps)

NameError: name 'video_reader' is not defined`

I am getting this NameError , can you please help how to solve this error

JanuszL commented 1 year ago

Hi @0Rutuja28-97,

Can you provide more details regarding the script you run? It looks like it uses OpenCV and not DALI.

0Rutuja28-97 commented 1 year ago

Thank You for your response. I issue is resolved right now.