Closed CrohnEngineer closed 4 years ago
Hi, Thank you for all questions. Going one by one:
Is there a limit for the number of videos to be opened by a VideoReader? Obviously 100 frames of 4000 videos cannot fit on a GPU memory, but I have imagined that each video would be loaded singularly only at the next() call of the DALIGenericIterator, so the frames would be loaded only when needed. Am I wrong?
DALI keeps all video files open. There is an OS limit how many of them can be open simultaneously - you can increase it as in this answer https://github.com/NVIDIA/DALI/issues/1350#issuecomment-539521606 @a-sansanwal - I think we need to limit the number of the video files open at once in DALI and close files above that limit
Moreover, for taking 100 frames of each video, is it right to have batch_size=1 and seq_length=100
That is correct.
RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/fused/crop_mirror_normalize.h:155] Assert on "output_layout_.is_permutation_of(input_layout_)" failed: The requested output layout is not a permutation of input layout.
Is it right to have the CropMirrorNormalize working on the input[0] element? I expect that element to be the 100 frames batch tensor, with the input[1] element being the label instead. Am I guessing right? Is something wrong in my code or in the way I am using the CropMirrorNormalize operation?
It should not be any problem running it on the input[0]
. @jantonguirao can you check why "output_layout_.is_permutation_of(input_layout_
the error appears (I'm also not sure if we can crop over z-axis in this case).
his time the code runs with no error, but it seems to "stop" after loading 3 videos only. The terminal stayed "freezed" for several minutes, and I had to kill the process. So, I am wondering
DALI doesn't support videos with VFR (variable frame rate) and the user may experience this kind of hangs. DALI has some heuristics to detect this kind of input and warn the user but it is not 100% accurate. You may want to update DALI to some nightly build and check if this is still the case (maybe there some other issues we have fixed already). Also if you can narrow down this problem to a particular video and share it with use we can verify what is the root cause.
I hope that my post is comprehensible and I apologize in advance for asking maybe too many non-related questions altogether, but I could not find any answer in the docs or in other issues here on GitHub.
We are happy to help.
Hi @JanuszL ,
thank you for you fast reply!
DALI keeps all video files open. There is an OS limit how many of them can be open simultaneously - you can increase it as in this answer #1350 (comment)
Unfortunately I don't have administrator privileges on this machine, but I have contacted the administrator and I will check if the code runs when I'll have the limit of open files increased for my user.
You may want to update DALI to some nightly build and check if this is still the case (maybe there some other issues we have fixed already). Also if you can narrow down this problem to a particular video and share it with use we can verify what is the root cause.
Regarding this, I have installed the 0.18.0.dev20191220 nightly release, and building the Pipeline both with or without the CropMirrorNormalize operation gives the following error
Starting test...
[/opt/dali/dali/operators/reader/loader/video_loader.cc:280] File /nas/public/dataset/1260311_1976794_B_001.mp4 does not have the same resolution as previous files. (720x1280 instead of 1080x1920). Install Nvidia driver version >=396 (x86) or >=415 (Power PC) to decode multiple resolutions
[/opt/dali/dali/operators/reader/video_reader_op.h:67] Decoder reconfigure feature not supported
Traceback (most recent call last):
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 76, in <module>
loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 59, in __init__
self.pipeline.build()
File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 316, in build
self._pipe.Build(self._names_and_devices)
RuntimeError: Decoder reconfigure feature not supported
Might be that the different resolution of this video is causing both the problem with the CropMirrorNormalize operation and the hang of the test script using the stable DALI release? It is strange though, because the driver of my GPU is the 430.26 version...
Thank you again for your help and time :)
I believe this is a problem that is being fixed in https://github.com/NVIDIA/DALI/pull/1591. It is about the decoder, not about CropMirrorNormalize. I recommend testing again with the nightly build when https://github.com/NVIDIA/DALI/pull/1591 is merged.
Regarding the issues with CropMirrorNormalize
. There are two problems.
A bit of history, CropMirrorNormalize was initially designed to work with 2D images only (width and height) and later was extended to work with volumetric (3D) images (width, height, and depth). Volumetric images are treated differently to sequences of frames (that is, video). Those have a layout with width, height, and a number of frames.
We specify layouts with a string like "HWC", "CHW", "FHWC", etc. Here are some examples:
There are two issues here:
There is another operator called Slice
that allows getting a slice on any dimension but its usage might require a little bit more code and you'll have to do normalization as a separate step.
I think that your use case could be accommodated into CropMirrorNormalize by allowing to do volumetric crop on sequences as well (treating the sequence dimension as depth internally). We will take a look and come back to you with a solution.
Hey @jantonguirao and @JanuszL ,
thanks again for your replies!
There are two issues here: The default output layout is "CHW". That is an image layout but your input layout is probably "FHWC". That is why you get the error saying that the output layout is not a permutation of the input layout. To make the layout conversion to a planar configuration, you could set the output layout to "FCHW" instead.
By reinstalling the master release and specificying output_layout="FCHW"
the error on the output layout not being a permutation of the input's disappears. I have taken for granted the temporal dimension (number of frames) in the layout specification, my bad I'm sorry guys!
There is still the hang on the loading of the videos, but as @JanuszL suggested here
I believe this is a problem that is being fixed in #1591. It is about the decoder, not about CropMirrorNormalize. I recommend testing again with the nightly build when #1591 is merged.
I will wait until #1591 is merged and run the test again.
For what concerns this
We never took into consideration using CropMirrorNormalize to modify the sequence dimension "F" so the API only allows you to specify a crop window in 3 dimension: H (height), W (width) and D (depth).
from the docs I didn't get that DALI considers differently volumetric inputs (heght width and depth) and sequences of frames (number of frames, height and width), but it might be that I have been biased by the fact that sometimes I work considering video files as 3D volumes, so I am used to treat them in that way. However,
I think that your use case could be accommodated into CropMirrorNormalize by allowing to do volumetric crop on sequences as well (treating the sequence dimension as depth internally). We will take a look and come back to you with a solution.
maybe the step
and stride
options of the VideoReader are more intuitive and direct for "cropping" video files in the temporal dimension? For instance, if you look at my code, the crop_z
parameter is always set to 0: I have only used the ops.Uniform(range=(0.0, 0.0))
to have an _EdgeReference
always equal to 0, since as I have understood float values are not accepted for specifying the crop positions for the Crop and CropMirrorNormalize operations?
Anyway, I just wanted to be sure that no cropping happened in the temporal dimension, but if I wanted the contrary I would have probably used the VideoReader directly.
In any case, If I can give my personal opinion on the matter, extending the Crop and CropMirrorNormalize operations in the temporal dimensions of videos could be a great feature for the library (if that is something possible for the internal implementation of DALI of course)! Many users would appreciate it :)
Sorry for bothering you, I just wanted to give my two cents! Thanks again for your time and patience!
Anyway, I just wanted to be sure that no cropping happened in the temporal dimension, but if I wanted the contrary I would have probably used the VideoReader directly.
If no parameter is provided then no cropping will happen across the z-axis (for 3D data).
Sorry for bothering you, I just wanted to give my two cents! Thanks again for your time and patience!
This is very valuable feedback. Thank you very much for it.
Thanks @CrohnEngineer for the very valuable feedback.
I apologize if the documentation of Crop / CropMirrorNormalize was not intuitive in that regard. We are constantly revisiting and updating the documentation and any suggestions on things to improve are very much welcome.
You are right, using the Video reader arguments to extract the relevant part of the video would be preferred. Doing that will save you the time for decoding the frames that you are not interested in.
As Janusz said, if you don't want to crop on the depth dimension you simply don't provide those arguments and the cropping will happen only on the height and width dimensions.
Hey @JanuszL and @jantonguirao ,
I'm glad my feedback helped you in any way, and thank you again for your replies and suggestions :) I just wanted to give you a quick update on my errors.
I believe this is a problem that is being fixed in #1591. It is about the decoder, not about CropMirrorNormalize. I recommend testing again with the nightly build when #1591 is merged.
I have seen #1591 has been merged, but I couldn't test my code until the new nightly release came out today. I have increased the number of open files allowed by the OS, and fixed the CropMirrorNormalize operation as suggested by @jantonguirao (with the crop_z dimension not provided as both of you suggested me). Still, my code hangs after opening 3 videos only, indifferently if opening 100 or 4000 videos. Using the stable release I get the error related to the decoder
Starting test...
[/opt/dali/dali/operators/reader/loader/video_loader.cc:247] File /nas/public/dataset/1260311_1976794_B_001.mp4 does not have the same resolution as previous files. (720x1280 instead of 1080x1920). Install Nvidia driver version >=396 (x86) or >=415 (Power PC) to decode multiple resolutions
[/opt/dali/dali/operators/reader/video_reader_op.h:58] Decoder reconfigure feature not supported
Traceback (most recent call last):
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 76, in <module>
loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 59, in __init__
self.pipeline.build()
File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 308, in build
self._pipe.Build(self._names_and_devices)
RuntimeError: Decoder reconfigure feature not `supported
this when opening videos from the 100 files list,
Starting test...
[/opt/dali/dali/operators/reader/loader/video_loader.cc:247] File /nas/public/dataset/2090100_2005778_A_002.mp4 does not have the same resolution as previous files. (720x1280 instead of 1920x1080). Install Nvidia driver version >=396 (x86) or >=415 (Power PC) to decode multiple resolutions
[/opt/dali/dali/operators/reader/video_reader_op.h:58] Decoder reconfigure feature not supported
Traceback (most recent call last):
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 75, in <module>
loader = DALILoader(batch_size, file_list, seq_length, [0.0, 256.0, 256.0], 0)
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 58, in __init__
self.pipeline.build()
File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 308, in build
self._pipe.Build(self._names_and_devices)
RuntimeError: Decoder reconfigure feature not supported
and this opening videos from the 4000 files list (just to show you that the error is not dependent on a single video file). Shouldn't this bug be fixed by #1591 ? Or maybe this feature has not been build yet in the nightly release? Am I missing something? Thanks again for your support and patience!
Edit: happy new year :) 🎉🎉🎉
@CrohnEngineer ,
Using the stable release I get the error related to the decoder ...
This is expected as #1591 should fix RuntimeError: Decoder reconfigure feature not supported
problem.
However, the hang you see is a different problem. I suspect it may be the problem of vfr (variable frame rate) video which DALI doesn't support. If you can narrow down the problem to one video with the nightly build and share this video we can check what is the exact cause.
Hey @JanuszL ,
However, the hang you see is a different problem. I suspect it may be the problem of vfr (variable frame rate) video which DALI doesn't support. If you can narrow down the problem to one video with the nightly build and share this video we can check what is the exact cause.
I am trying to narrow down one of the videos that may cause the problem, but I am encountering some difficulties. Let me explain: one of the first question I asked you was
Moreover, for taking 100 frames of each video, is it right to have batch_size=1 and seq_length=100? That is correct.
As the docs suggest, I have created a file_list.csv
with the path of each video followed by the label associated to it. For instance, the first 5 entries of the file where I have inserted the paths of the 4000 videos dataset are
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1224068/1224068_H_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/700790/700790_D_001.mp4 0
Using this file list, and the setting indicated above (batch_size=1
and seq_length=100
), as I asked you I have imagined that each batch would contain only one element constituted by 100 frames of each video.
Therefore, the elements returned by the DALIGenericIterator
would be:
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4
, and a label=1
; /nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4
, and again a label=1
; /nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4
, with a label=0
; Instead, when I run the test code posted in the first comment, the iterator
(which I have found, using a debugger, is the element "hanging" in the code), before it hangs, returns the first three elements as positive (so label=1
), as you can see in the picture
I have modified the order of the elements in the list, so that the first is a negative video followed by two positives
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4 1
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1224068/1224068_H_003.mp4 0
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/700790/700790_D_001.mp4 0
However, running the test code again the iterator
hangs after three videos, returning for the first two elements label=0
(as can you see again in the picture below)
Shouldn't be the sequence of labels in this case be 0 1 1
(negative positive positive)? Could you please explain me what is happening here? Does this mean that the iterator
is picking two batches sequentially from the same video?
I'm sorry for asking (yet) another question, but without being sure how the element is created I cannot point exactly to no video.
However, if it can help, I have checked using ffmpeg the frame rate of the first 5 entries of the list, together with if they had VFR, and the results are:
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1939161/1939161_A_003.mp4, label=0, frame_rate=15 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/2005778/2005778_A/2090100_2005778_A_002.mp4, label=1, frame_rate=15 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/method_A/1795659/1795659_B/2059066_1795659_B_001.mp4, label=1, frame_rate=29.977946 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/1224068/1224068_H_003.mp4, label=0, frame_rate=30 FPS, no VFR
/nas/public/dataset/fb_dfd_release_0.1_final/original_videos/700790/700790_D_001.mp4, label=0, frame_rate=29.970030 FPS, no VFR
As you can see, the single frame rate can be different from video to video, but no VFR is employed (at least in the videos I have opened so far, and for what the creators of the dataset have disclosed publicly).
Thanks again for your time and support, I hope I have explained everything clearly!
@a-sansanwal - could you look into that problem?
Hi @CrohnEngineer , the hang issue which you mentioned seems very likely related to https://github.com/NVIDIA/DALI/pull/1592. It was merged yesterday so it should be in nightly soon.
Also regarding
Shouldn't be the sequence of labels in this case be 0 1 1 (negative positive positive)? Could you please explain me what is happening here?
That depends, if the first video has say 200 frames. Then you will get 0-99 frames from first video and then 100-199 frames from the first video and then we move on to the second video.
If you want to choose only 0-99 frames from first video, you can do something like the following in your file_list.txt while also setting file_list_frame_num=True
in VideoReader.
file.mp4 0 0 100
file1.mp4 1 0 100
file2.mp4 2 0 100
Using a file_list.txt similar to that will allow you to choose specific frames from a video. You can see the example here https://github.com/NVIDIA/DALI/pull/1612/files which demonstrates this.
Please feel free to ask any more questions. Hope this answered your query.
Hey @a-sansanwal and @JanuszL ,
thank you for your timely replies!
If you want to choose only 0-99 frames from first video, you can do something like the following in your file_list.txt while also setting file_list_frame_num=True in VideoReader.
file.mp4` 0 0 100
file1.mp4 1 0 100
file2.mp4 2 0 100
I have modified my file list in order to have the start and end frames' numbers indicated as @a-sansanwal suggested, and inserted the file_list_frame_num=True
options in the VideoReader
.
Anyway, using a debugger for checking the code execution, I have noticed that while before the code would hang while the iterator was returning the elements, this time it idles while returning the iterator itself. In poor words, before the code would hang here
while` iterator:
item = iterator.__next__()
for label in item[0]["label"]:
print('Video is positive!') if label == 1 else print('Video is negative!')
inside the while cycle. Now, the code hangs just after submitting the instruction for creating the DALIGenericIterator
self.dali_iterator` = pytorch.DALIGenericIterator(self.pipeline,
["file", "label"],
self.epoch_size,
auto_reset=True)
Do you have any guess for this behaviour? Anyway,
Hi @CrohnEngineer , the hang issue which you mentioned seems very likely related to #1592. It was merged yesterday so it should be in nightly soon.
I will test the code again with the next nightly release.
Thank you again for your time and responses!
pytorch.DALIGenericIterator
prefetches the first batch - https://github.com/NVIDIA/DALI/blob/master/dali/python/nvidia/dali/plugin/pytorch.py#L148. So in this hangs happen when the first batch is computed.
Hey @JanuszL and @a-sansanwal ,
Hi @CrohnEngineer , the hang issue which you mentioned seems very likely related to #1592. It was merged yesterday so it should be in nightly soon.
I have installed the last nightly release, and finally my script runs without hangs! Thank you very much for your support, help and answers during these weeks, they have been priceless!
Anyway, I'm sorry to bother you again, but I have another question regarding the CropMirrorNormalize operation (I don't know if maybe it is better to ask @jantonguirao directly). While my code now seems to run without hangs, at a certain point this error pops up
Starting test...
Loading videos at 2020-01-08 15:32:47.003687...
Video 0 is negative!
Video 1 is positive!
Video 2 is negative!
Video 3 is positive!
Video 4 is positive!
Video 5 is positive!
Video 6 is positive!
Video 7 is positive!
Video 8 is negative!
Video 9 is positive!
Video 10 is positive!
Video 11 is positive!
Video 12 is positive!
Video 13 is positive!
Video 14 is positive!
Video 15 is positive!
Video 16 is negative!
Video 17 is positive!
Video 18 is negative!
Video 19 is negative!
Video 20 is positive!
Video 21 is positive!
Video 22 is positive!
Video 23 is positive!
Video 24 is positive!
Video 25 is positive!
Video 26 is positive!
Video 27 is positive!
Video 28 is negative!
Video 29 is positive!
Video 30 is positive!
Video 31 is negative!
Video 32 is positive!
Video 33 is negative!
Video 34 is positive!
Video 35 is positive!
Video 36 is positive!
Video 37 is positive!
Video 38 is positive!
Traceback (most recent call last):
File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 85, in <module>
item = iterator.__next__()
File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 163, in __next__
outputs.append(p.share_outputs())
File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 409, in share_outputs
return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline: [/opt/dali/dali/operators/crop/crop_attr.h:154] Assert on "crop_shape[dim] > 0 && crop_shape[dim] <= input_shape[dim]" failed: Crop shape for dimension 1 (256) is out of range [0, 240]
Stacktrace (15 entries):
[frame 0]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2b56fe) [0x7f3a468c86fe]
[frame 1]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x35f898) [0x7f3a46972898]
[frame 2]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x362401) [0x7f3a46975401]
[frame 3]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x3637d9) [0x7f3a469767d9]
[frame 4]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0xf54923) [0x7f3a47567923]
[frame 5]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0xf575f7) [0x7f3a4756a5f7]
[frame 6]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0xf1c190) [0x7f3a4752f190]
[frame 7]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x7c9a4d) [0x7f3a46ddca4d]
[frame 8]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0xc32bd) [0x7f3a452a62bd]
[frame 9]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0xc3c11) [0x7f3a452a6c11]
[frame 10]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x957c3) [0x7f3a452787c3]
[frame 11]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x112856) [0x7f3a452f5856]
[frame 12]: /nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x7308b0) [0x7f3a459138b0]
[frame 13]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f3aac59d6db]
[frame 14]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f3aac2c688f]
Current pipeline object is no longer valid.
Process finished with exit code 1
Do you have any guess on what is happening? I have checked the video causing the problem and seems to be coherent with the other ones in the list. I'm not sure however if I'm using the CropMirrorNormalize operation correctly. Just to recall my code in the first comment, I define the graph as
def define_graph(self):
input = self.reader(name="Reader")
output = self.crop(input[0], crop_pos_x=self.uniform(), crop_pos_y=self.uniform())
return output, input[1]
input
returned by the VideoReader is a list of two elements, and I thought that input[0]
would represent the video frames tensor while input[1]
the label associated to them. So, I have inserted the crop operation on input[0]
only, but I'm not sure if that's right.
However, if I write
output = self.crop(input, crop_pos_x=self.uniform(), crop_pos_y=self.uniform())
, I receive the following error
TypeError: Expected outputs of type compatible with "EdgeReference". Received output type with name "list" that does not match.
Could you explain me why the VideoReader returns a list? What do the elements of the list represent? It is right to call the CropMirrorNormalize operation on just one element like input[0]
?
Thank you again for your time and feedback!
@CrohnEngineer - VideoReader
returns frames and labels, also it can return frame numbers and time stamps. So input[0]
has frames, while input[1]
has labels and so on.
Regarding the error. Does the video has the same resolution as the other one? The error sounds like you want to crop to 256 in one dimension while your video has 240 at most. For images usually resize operation is conducted first to make sure that input to the crop has a certain size, however, resize doesn't support video sequences yet. For now, could you try to remove the videos with a lower resolution than your crop argument?
Hey @JanuszL ,
@CrohnEngineer - VideoReader returns frames and labels, also it can return frame numbers and time stamps. So input[0] has frames, while input[1] has labels and so on.
I thought it was something like this, thanks for the clarification! Regarding the video
Does the video has the same resolution as the other one? The error sounds like you want to crop to 256 in one dimension while your video has 240 at most.
You were right, the video causing the problem had a resolution of 320x240 pixels: I have reduced the crop dimension to 240x240 and it finally runs fine. Thank you really really much for your help!!!
If you don't mind, I would ask a very last question before closing the issue. As I said now the code runs, but at video number 67 it seems to fail to allocate the memory for the GPU
Starting test... Loading videos at 2020-01-09 13:05:50.638481... Video 0 is negative! Video 1 is positive! Video 2 is negative! Video 3 is positive! ... ... Video 53 is positive! Video 54 is positive! Video 55 is positive! Video 56 is negative! Video 57 is positive! Video 58 is negative! Video 59 is negative! Video 60 is positive! Video 61 is positive! Video 62 is positive! Video 63 is negative! Video 64 is positive! Video 65 is positive! Video 66 is positive! Traceback (most recent call last): File "/nas/home/ecannas/deepfakedetection/code/utilities/DALILoader.py", line 86, in
item = iterator.next() File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 163, in next outputs.append(p.share_outputs()) File "/nas/home/ecannas/miniconda3/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 409, in share_outputs return self._pipe.ShareOutputs() RuntimeError: Critical error in pipeline: CUDA allocation failed Current pipeline object is no longer valid.
Is this a memory issue (I'm trying to allocate too much frames on the GPU) or is something else? I am currently running the code on a Nvidia Titan V with 12 GB of memory. From the start of the execution, around 8 GB of memory are allocated, which grows towards 8.9 GB and then the execution stops.
Thank you again for your help and support!
You can try to play with additional_decode_surfaces
and initial_fill
parameters of the VideoReader
. Also, you can play with prefetch_queue_depth
pipeline argument.
Hey @JanuszL ,
I have tried setting additional_decode_surfaces
to 0, but nothing changed. Moreover, I have set shuffle=False
so initial_fill
shouldn't be considered. In any case, I have tried also setting initial_fill=0
and prefetch_queue_depth=1
, but with no success.
Anyway, I will close the issue since my original problem has been solved! Maybe I can open another one if the problem persists and is of interest for you too?
Thank you again for your help, and also @jantonguirao and @a-sansanwal ! Your support has been extremely helpful and irreplaceable! :)
@CrohnEngineer - if you have some video samples you could share to show this memory grow we can check this on our side. Anything but decoding video with the bigger resolution doesn't come to my mind as a reason why memory consumption keeps growing when you run the pipeline.
Hey @JanuszL ,
the dataset is made by 4000 videos, I can't immediately share it with you.
Anyway, it is the preliminary dataset of the Facebook Deepfake Detection Challenge, of which I took part of it and made a "train" directory. If you have access to this dataset, you can simply create a file_list.csv
and check if it is really a memory issue or something else.
Anyway, maybe it is not important, but the strange thing to me is that the memory is not completely saturated when the error takes place: the TITAN V has 12GB of memory, but the CUDA allocation failed
error happens when 9GB of it are occupied. Does it look strange to you too?
Ok, if it is https://www.kaggle.com/c/deepfake-detection-challenge/data
then I can access it. I will try to repro your problem.
Hey @JanuszL ,
be careful, the one on Kaggle I think is the complete dataset of 120000 videos. Moreover, it is a little complicated in its organization (it is divided in multiple folders). For this reason I am using the one hosted on https://deepfakedetectionchallenge.ai/, because I wanted to get used to DALI before moving to a very large dataset. However, I think the preliminary dataset is no longer available. If you have enough resources and time to check it on the complete dataset, that would be awesome in any case. I think the characteristics of the videos in the two datasets are pretty similar (in terms of resolution, FR, VFR, etc...).
@CrohnEngineer - the one I see consists of 400 videos so it is fine. I think I see where the problem is. DALI reader uses a prefetch buffer. It has 2 batch_size prefetch_depth size. For Full HD image (~23MB when returned as a float) it makes > 40MB for a sequence of length 1. As DALI works as a pipeline VideoReder output needs to keep own buffer for the whole sequence. So in your case consumed memory is 2 batch_size prefetch_depth + batch_size. So for the sequence length of 100 it makes 23 3 100 ~= 7GB. With the code you provided I'm able to almost fully saturate the memory to 12GB (my GPU has also 12GB). The most promising optimization that comes to my mind is to fuse (reenable) resize inside the VideoReader so each output frame is not that heavy. @a-sansanwal - what do you think?
@CrohnEngineer - I see one incomplete implementation in DALI. Even you ask the VideoReader for dtype=types.UINT8
it internally allocates memory for float32 data. I will fix that soon, it should reduce memory occupation 4 times (I hope).
https://github.com/NVIDIA/DALI/pull/1643 should reduce memory consumption
@JanuszL Thats a good idea, we could add support for argument like resize_x, resize_y in VideoReader and do resize in VideoReader itself. I will add it to my to-do list.
We have a scale
argument but its implementation is not there.
Hey @JanuszL ,
sorry for the delay in answering you!
@CrohnEngineer - I see one incomplete implementation in DALI. Even you ask the VideoReader for dtype=types.UINT8 it internally allocates memory for float32 data. I will fix that soon, it should reduce memory occupation 4 times (I hope).
1643 should reduce memory consumption
That's good to hear! I cant' wait for trying it out :) Regarding this
I think I see where the problem is. DALI reader uses a prefetch buffer. It has 2 batch_size prefetch_depth size. For Full HD image (~23MB when returned as a float) it makes > 40MB for a sequence of length 1. As DALI works as a pipeline VideoReder output needs to keep own buffer for the whole sequence. So in your case consumed memory is 2 batch_size prefetch_depth + batch_size. So for the sequence length of 100 it makes 23 3 100 ~= 7GB.
I made your same computation, and so accounted for the 7GB of memory used by the GPU for the prefetch batches and the actual elements. Still, I don't get why the memory consumption increases? Does it mean that DALI keeps all the element fetched so far in the GPU? Maybe it's a dumb question, but I am a little confused on how DALI uses the GPU's memory, would you mind explain that to me briefly?
I made your same computation, and so accounted for the 7GB of memory used by the GPU for the prefetch batches and the actual elements. Still, I don't get why the memory consumption increases?
I was not able to reproduce this memory usage grown with the test code you have provided. In some case, it is possible as DALI uses a lazy approach to memory allocation (it enlarges allocation when needed, but doesn't free anything as any free or alloc for the GPU is very time-consuming) and when a bigger image at the output of some random operator (like random resize or just decoder) appear memory need to be additionally allocated. But in case of your pipeline the size of the images are the same so the watermark should be reached very soon and no additional allocation should happen (I don't see that in my case).
I was not able to reproduce this memory usage grown with the test code you have provided. In some case, it is possible as DALI uses a lazy approach to memory allocation (it enlarges allocation when needed, but doesn't free anything as any free or alloc for the GPU is very time-consuming) and when a bigger image at the output of some random operator (like random resize or just decoder) appear memory need to be additionally allocated. But in case of your pipeline the size of the images are the same so the watermark should be reached very soon and no additional allocation should happen (I don't see that in my case).
Ok, thank you @JanuszL for the explanation! Do you any hint on where to search for finding the root of the problem?
I would start with running nvidia-smi -lms 100 --query-gpu=memory.used --format=csv
in the console to see how memory utilization grows. As I said, in my case it watermark is 8636 MiB for deepfake-detection-challenge dataset, batch size 1, sequence size 100.
Hey @JanuszL ,
I think I finally found the root of the problem.
when a bigger image at the output of some random operator (like random resize or just decoder) appear memory need to be additionally allocated.
You were right! As you were suggesting, I found out that some of the videos present a resolution greater than the 1920x1080 of the full HD! What happened here
As I said now the code runs, but at video number 67 it seems to fail to allocate the memory for the GPU
is that the successive video in the list (video number 68) has a resolution of 3840x2160 pixels; while prefetching the successive batch with DALI the GPU runs out of memory and therefore from here the CUDA allocation failed
error pops out.
Reducing the sequence_length
allowed me to see the allocation of the biggest frames in the GPU and the "spike" in the memory consumption; until #1643 is merged, I will probably work with shorter sequences.
Speaking of this, I would like to use the stride
argument of the VideoReader
.
If I have a video of, let's say, 10 (numbered) frames, like this [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
, and I would like to have a sequence of sequence_length=5
and stride=2
, this means that the resulting sequence will contain one frame every two right? Resulting in something like this [0, 2, 4, 6, 8]
?
Thank you really really much for your help! This code had kept me busy for weeks now, without your assistance I could never make it work!
@CrohnEngineer,
If I have a video of, let's say, 10 (numbered) frames, like this [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], and I would like to have a sequence of sequence_length=5 and stride=2, this means that the resulting sequence will contain one frame every two right? Resulting in something like this [0, 2, 4, 6, 8]?
It should work exactly as you say. If you use nightly build you can enable enable_frame_num
and get the actual frame number to verify if it works as you want.
@CrohnEngineer - it is merged and should be available in the next nightly build.
can anybody tell me how to solve this error
File "/home/knuvi/Desktop/Kavita/fastdvdnet-0.1/dataloaders.py", line 52, in init self.crop = CropMirrorNormalize(device="gpu", \ NameError: name 'CropMirrorNormalize' is not defined
Hi @kavita19,
Can you provide more details about the code you are trying to run? In my case:
import nvidia.dali
nvidia.dali.ops.CropMirrorNormalize()
Just works. Can you check it on your side?
Hello Thanks for your reply. I am trying to run Fastdvdnet model. Firstly I am getting error for Nvidia module name not found error then I install Nvidia-Dali as per requirement. Then I am trying to train model but I am getting error in dataloaders.py file.
this code (line)
# Define crop and permute operations to apply to every sequence
self.crop = CropMirrorNormalize(device="gpu", \
crop=crop_size, \
output_layout=types.NCHW, \
output_dtype=types.FLOAT)
self.uniform = ops.Uniform(range=(0.0, 1.0)) # used for random crop
Hi @kavita19,
Please make sure that you have the most recent DALI version. The installation instruction can be found here. Also make sure you haven't changed anything in the fastdvdnet code. As I see the mentioned piece of code looks different in the official repository compared to what you provided:
# Define crop and permute operations to apply to every sequence
self.crop = ops.CropMirrorNormalize(device="gpu",
crop_w=crop_size,
crop_h=crop_size,
output_layout='FCHW',
dtype=types.DALIDataType.FLOAT)
self.uniform = ops.Uniform(range=(0.0, 1.0)) # used for random crop
vs what you provided:
# Define crop and permute operations to apply to every sequence
self.crop = CropMirrorNormalize(device="gpu", \
crop=crop_size, \
output_layout=types.NCHW, \
output_dtype=types.FLOAT)
self.uniform = ops.Uniform(range=(0.0, 1.0)) # used for random crop
Loading datasets ...
Traceback (most recent call last):
File "train_fastdvdnet.py", line 214, in
Hi @kavita19,
Can I install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==1.2.0 with cuda 11.4 (my pc) ? because I have issues with dali installation.
Yes, it should work. Make sure that you have the latest pip version installed pip install --upgrade pip
.
File "/home/knuvi/Desktop/Kavita/fastdvdnet/dataloaders.py", line 99, in init step=temp_stride) TypeError: init() got an unexpected keyword argument 'sequence_length'
Seems like an error not related to DALI, class VideoReaderPipeline(Pipeline):
is part of the fastdvdnet code. I would double-check if your source code is not corrupted.
Thanks. After solving this error. My training of fastdvdnet started but I am getting ZeroDivisionError: division by zero after first epoch.
[epoch 1][3981/4000] loss: 12.6643 PSNR_train: 0.0000
[epoch 1][3991/4000] loss: 13.8910 PSNR_train: 0.0000
Traceback (most recent call last):
File "train_fastdvdnet.py", line 212, in
On Tue, Oct 5, 2021 at 10:18 AM Janusz Lisiecki @.***> wrote:
Hi @kavita19 https://github.com/kavita19,
Can I install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==1.2.0 with cuda 11.4 (my pc) ? because I have issues with dali installation.
Yes, it should work. Make sure that you have the latest pip version installed pip install --upgrade pip.
File "/home/knuvi/Desktop/Kavita/fastdvdnet/dataloaders.py", line 99, in init step=temp_stride) TypeError: init() got an unexpected keyword argument 'sequence_length'
Seems like an error not related to DALI, class VideoReaderPipeline(Pipeline): is part of the fastdvdnet code. I would double-check if your source code is not corrupted.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/1599#issuecomment-934606566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENDB57YTYEWJI7ZISDI7HLUFMXNPANCNFSM4J57WS6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi @kavita19,
I guess your validation dataset is empty (this is probably the only reason why len(dataset_val)
is 0). Could you check it?
Hi @Janusz
I already gave the validation path and in the validation folder I kept my own image sequences as per github(FastDvdnet) reference. but it still gives the same error message [ZeroDivisionError].
On Wed, Oct 6, 2021 at 2:56 AM Janusz Lisiecki @.***> wrote:
Hi @kavita19 https://github.com/kavita19,
I guess your validation dataset is empty (this is probably the only reason why len(dataset_val) is 0). Could you check it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/1599#issuecomment-935878532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENDB54YGVRRNOMGXVB6B5DUFQMOLANCNFSM4J57WS6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi @kavita19,
In such case, I would extract the code part that creates dataset_val
and check it len
manually. Also, there may be some errors/warnings you have missed. Or the fastdvdnet
code doesn't work with the recent DALI version and you need to ask its author for help.
Ok thank you so much for your help. I will check this.
On Fri, Oct 15, 2021 at 12:35 AM Janusz Lisiecki @.***> wrote:
Hi @kavita19 https://github.com/kavita19,
In such case, I would extract the code part that creates dataset_val and check it len manually. Also, there may be some errors/warnings you have missed. Or the fastdvdnet code doesn't work with the recent DALI version and you need to ask its author for help.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DALI/issues/1599#issuecomment-944068667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENDB54QBRHUXTCEJAX22ZLUG7KVLANCNFSM4J57WS6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
`video_path = "demo_video_5sec.mp4"
fps = 2
list1 = [] def video_reader(path, fps):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_FPS, fps)
i = 0
while cap.isOpened():
ret, frame = cap.read()
if ret:
#image = cv2.resize(frame)
#cv2.imshow("image", frame)
mesh_points = face_mesh(frame)
#print(mesh_points)
frame_dict = {}
if mesh_points is not None:
frame_dict[i]= mesh_points
#video_dict.append(frame_dict)
list1.append(mesh_points)
i = i+1
#if cv2.waitKey(25) & 0xff == ord('q'):
#break
else:
break
cap.release()
video_reader(video_path,fps)
NameError: name 'video_reader' is not defined`
I am getting this NameError , can you please help how to solve this error
Hi @0Rutuja28-97,
Can you provide more details regarding the script you run? It looks like it uses OpenCV and not DALI.
Thank You for your response. I issue is resolved right now.
Hi everybody,
I'm opening an issue since I am encountering several problems writing a Pipeline for loading video files. I'm not sure of wheter DALI is the best tool for my task and neither if I am using it properly, so I would start by first explaining my goal. I have a very huge dataset consisting of hundreds of thousands of videos, and I would like to use DALI's VideoReader to build a PyTorch DataLoader since, according to the documentation, DALI's VideoReader uses
NVIDIA GPU’s hardware-accelerated video decoding
, so I would like to speed-up and eventually parallelize the training of a CNN using the GPU for the data loading operations. I took the Video Super Resolution example (https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/examples/video/superres_pytorch/README.html) and wrote my personal DALILoader as it follows:I created a
file_list.csv
where I have written all the paths and the labels of the videos (my task is a simple binary classification), and this simple test script:I want simply to load 100 frames of each video, then crop them randomly in the height and width dimension. As a first test, I didn't want to use the whole dataset, so I used just a portion of it (we are talking about 4000/5000 videos in any case), but when I run the code I have encountered three major errors. I report them in "discovery order", as after I have encountered the first one I have simplified my code reducing the task complexity too for doing a little debugging. I have DALI 0.16.0 installed, running the code on an Ubuntu machine with an E5-2630 CPU, 128GB of RAM and a single NVIDIA Quadro P6000 GPU.
The first error appears by simply running the script above as it is:
My first question therefore is:
next()
call of theDALIGenericIterator
, so the frames would be loaded only when needed. Am I wrong? Moreover, for taking 100 frames of each video, is it right to havebatch_size=1
andseq_length=100
?As a second experiment, I reduced the number of videos to 100. This time it seems that DALI is able to load the videos, but I got another error instead:
I am probably using the crop operation wrong, so
Finally, as a last experiment I have removed the CropMirrorNormalize operation and built the pipeline using the VideoReader only. This time the code runs with no error, but it seems to "stop" after loading 3 videos only. The terminal stayed "freezed" for several minutes, and I had to kill the process. So, I am wondering
I hope that my post is comprehensible and I apologize in advance for asking maybe too many non-related questions altogether, but I could not find any answer in the docs or in other issues here on GitHub.
Thank you in advance!