Open syb0rg opened 4 years ago
Hi, I have tried it on my setup, DALI 0.18.0 and I get: Can you try to run any of our examples for video using data fromDALI_extra repository? @a-sansanwal do you have any suggestion, any plotting or decoding issue that comes to your mind?
Sure thing, using the same basic example on this video I get this for the first frame:
Could you provide more detailed info about GPU model, driver and etc.? It looks like a problem with NVDEC itself, rather than DALI.
Yup! This is being run on an AWS p2.xlarge instance with the Deep Learning AMI (Ubuntu 16.04) Version 24.3 (ami-05931d11d2bf831c3).
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
$ nvidia-smi
Tue Feb 4 15:36:07 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 41C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Could you also check how this code works for you - luma and chroma showed separatelly:
import numpy as np
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
from matplotlib import pyplot as plt
import matplotlib.gridspec as gridspec
class VideoPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data, shuffle):
super(VideoPipe, self).__init__(batch_size, num_threads, device_id, seed=16)
self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=2,
shard_id=0, num_shards=1,
random_shuffle=shuffle, initial_fill=1, image_type=types.YCbCr)
def define_graph(self):
output = self.input(name="Reader")
return output
pipe = VideoPipe(batch_size=2, num_threads=2, device_id=0, data=video_files, shuffle=False)
pipe.build()
pipe_out = pipe.run()
sequences_out = pipe_out[0].as_cpu().as_array()
fig = plt.figure(figsize = (16,12))
gs = gridspec.GridSpec(3, 1)
plt.subplot(gs[0])
plt.axis("off")
plt.imshow(np.transpose(sequences_out[0][0], [2,0,1])[0])
plt.subplot(gs[1])
plt.imshow(np.transpose(sequences_out[0][0], [2,0,1])[1])
plt.subplot(gs[2])
plt.imshow(np.transpose(sequences_out[0][0], [2,0,1])[2])
Done with cfr_test.mp4
:
With my original video:
@syb0rg - YCbCr looks good, I will check if anything can go wrong with conversion to RGB and get back to you.
It looks like this issue may be specific to when file_list
is used as a parameter in place of filenames
. I was able to use all of my original parameters except that one to get the expected output.
I have tested both cases (with only this video in the file_list
and it worked fine). Will try to retest on AWS.
Is there another way to get labels into the Pipeline for now? I was hoping I could specify a list of tuples to filenames such as
video_files = [('/home/ubuntu/data/train_sample_videos/bmehkyanbj.mp4', 1)]
But it doesn't look like this is supported.
Is there another way to get labels into the Pipeline for now? I was hoping I could specify a list of tuples to filenames such as
video_files = [('/home/ubuntu/data/train_sample_videos/bmehkyanbj.mp4', 1)]
But it doesn't look like this is supported.
If you put each file into a separate directory - like this, then labels will be assigned for each dire separately. This tutorial should explain this.
I've managed to reproduce the problem with filenames
as well. It looks like the decoder starts misbehaving when there are greater than or equal to 15 files passed to ops.VideoReader()
. I've confirmed this is the same issue with file_list
in my first comment. 14 files or less work fine, any more is where issues start occurring.
I was testing with only one file. Will check with 15. Thanks for the hint.
I tried to repro that using AWS as well but without any success:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 47C P0 54W / 149W | 338MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2238 C /usr/bin/python2.7 327MiB |
+-----------------------------------------------------------------------------+
Deep Learning AMI (Ubuntu 16.04) Version 26.0 (ami-0e30cdd8359d89531)
p2.xlarge
I used the same video to fill the video_files.csv
content with more than 15 entries. Maybe there is some video on your list that breaks the decoding of other videos?
You were right, the 15th video is causing problems (good call), I am unsure why though. I can play it on a media player, and if I put only that video in video_files.csv
then it works. The smallest case I could find where it breaks is where the video_files.csv
is the following:
/home/ubuntu/data/train_sample_videos/bmehkyanbj.mp4 1
/home/ubuntu/data/train_sample_videos/dzwkmcwkwl.mp4 1
For some reason the combination of the files breaks it? Here is the "bad" video file. Hopefully now you will be able to reproduce!
Work for me locally. I will try on AWS again.
I think I have reproduced it. I will keep you posted.
It seems to work on P3 instance (V100 GPU), while it doesn't on K80. We check with the relevant team. In the mean time you can try to use P3 if possible.
Strange, unfortunately I'm limited to the p2.xlarge for now 😕
Hi @syb0rg, The fix should be available in the most recent 470.x family of the driver. Can you check on your side and confirm if the problem is fixed?
Running my own defined pipeline as well as the Video Data Loading Tuorial, I am unable to load my videos as expected. I know these videos aren't corrupt, as I'm able to view them on a media player.
Here is a minimal example (assuming imported libraries):
This results in the following:
video_files.csv
is defined as such:I'm attaching an example of the first file for reference. This is on Cuda 10.1, Python 3.6.10, DALIv0.18.0 (also tried the nightly builds). Am I doing anything wrong? Please let me know if you need any more information!