Closed anibali closed 3 years ago
Hi @anibali,
DALI treats sequences as video frames, which means that they should have a uniform size (which may not be applicable in your case), and all transformations are applied uniformly across the frames in the sequence inside one sample.
However, this causes issues when I try to use Dali's random augmentation capabilities (each frame in the clip is transformed separately).
What you can do (depending on what random augmentation you use) is to try out the permute_batch
operator. If you use a random generator to drive other operations you need to use permute_batch
to duplicate one value across all samples (so the randomness is applied the same way for each sample).
If you can provide a self-contained example we can run on our side we can provide more suggestions.
Thanks for the prompt response! I really appreciate how communicative you are in this project (it definitely made browsing past issues much more fruitful).
DALI treats sequences as video frames, which means that they should have a uniform size (which may not be applicable in your case)
This does indeed apply for my case, and seems like a reasonable assumption to me.
and all transformations are applied uniformly across the frames in the sequence inside one sample.
This is a problem for one type of augmentation that I have in mind (simulated camera movement). It would also definitely be a problem for cases where the crop "tracks" the subject (not the case for my current project). I'm not sure if it's an unavoidable technical limitation, but disallowing per-frame transformation is a very big restriction that would have caused major issues for me in past projects.
What you can do (depending on what random augmentation you use) is to try out the permute_batch operator. If you use a random generator to drive other operations you need to use permute_batch to duplicate one value across all samples (so the randomness is applied the same way for each sample).
permute_batch
sounds like it might work for my current setup (sequence as a batch of images), I'll give it a go.
Hi @anibali,
I'm not sure if it's an unavoidable technical limitation, but disallowing per-frame transformation is a very big restriction that would have caused major issues for me in past projects.
It is rather a strong limitation. As I explained we assume that the sequence is a sample that should be uniformly transformed across the frames. If you want to have a different transformation per frame then I would treat a sequence as a batch of separate frames.
permute_batch sounds like it might work for my current setup (sequence as a batch of images), I'll give it a go.
I'm looking forward to hearing more about your results.
I can confirm that permute_batch
worked for replicating randomly generated numbers such that they are shared for images belonging to the same clip. To make things easier I wrote a little helper:
class PerClipRng:
"""An NVIDIA Dali helper for generating per-clip random numbers.
Assuming that "batches" in the pipeline have the following layout
[A1, A2, ..., An, B1, B2, ... Bn, C1, ...]
it is guaranteed that the random numbers generated will be the same for each image in a clip
(e.g. A1, A2, ..., An will all have the same value).
"""
def __init__(self, clips_per_batch, images_per_clip):
self.clips_per_batch = clips_per_batch
self.images_per_clip = images_per_clip
def _repeat_per_clip(self, batch):
indices = list(np.repeat(np.arange(self.clips_per_batch), self.images_per_clip))
batch_replicated = dali.fn.permute_batch(batch, indices=indices)
return batch_replicated
def coin_flip(self, probability=None):
batch = dali.fn.random.coin_flip(probability=probability)
return self._repeat_per_clip(batch)
def normal(self, mean=None, stddev=None):
batch = dali.fn.random.normal(mean=mean, stddev=stddev)
return self._repeat_per_clip(batch)
def uniform(self, range=None):
batch = dali.fn.random.uniform(range=range)
return self._repeat_per_clip(batch)
I still think that it would be nice if there was a way to have the external source produce video sequences as opposed to taking the batch-of-images approach, but I'm going to mark this issue as resolved. Thanks for your help.
Hi @anibali,
If you have a uniform in size frames you can do something like this as well:
import numpy as np
import nvidia.dali.fn as fn
from nvidia.dali import pipeline_def
import os
batch_size = 10
sequence_length = 4
test_data_root = os.environ['DALI_EXTRA_PATH']
jpeg_file = os.path.join(test_data_root, 'db', 'single', 'jpeg', '510', 'ship-1083562_640.jpg')
def get_data(sample_info):
# just an example which repeats the same frame, but you can put different frames there. If images are encoded you need
# to zero pad all of them. Encoded JPEG has its size in the header to trailing zeros are natural
out = [np.fromfile(jpeg_file, dtype=np.uint8) for _ in range(sequence_length)]
# add label
out.append(np.array([1,2,3]))
return out
@pipeline_def
def simple_pipeline():
*jpegs, label = fn.external_source(source=get_data, num_outputs=sequence_length+1, parallel=True, batch=False)
images = fn.decoders.image(jpegs, device="mixed", hw_decoder_load=1)
sequence = fn.stack(*images)
sequence = fn.reshape(sequence, layout="DHWC")
return sequence, label
pipe = simple_pipeline(batch_size=batch_size, num_threads=4, prefetch_queue_depth=2, device_id=0)
pipe.build()
pipe.run()
out = pipe.run()
print(np.array(out[0][0]).shape)
print(np.array(out[1][0]))
Ah, that's a really good example---I didn't realise that you could pass a list as input to fn.decoders.image
!
Please keep in mind that passing a list to the operator creates a len number of its instances. In the case of the mixed
decoder, it will make it allocate GPU memory for each of it and with a bigger sequence length, you can just run out of it.
I have a data loading requirements that do not fit into the "an example is a file on disk" type structure that Dali seems to assume natively. Essentially I have image files (JPEGs) and crops within them that define "clips" (sequences of frames) which are cropped around particular people of interest in the frames (there can be multiple crops in the same images, leading to multiple examples). In case you were curious, the particular use case is multi-person tracking.
My first attempt at using Dali was to define my own ExternalInputIterator like the one shown in the tutorials (https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/external_input.html), using the "batch dimension" as time (so each "batch" is actually a single clip). However, this causes issues when I try to use Dali's random augmentation capabilities (each frame in the clip is transformed separately). It also means that I can't easily batch multiple clips since I am already (ab)using batching for another purpose. Here's a simplified version of what I currently have:
Is there a way of using External Source (
external_source
) for the case where each individual example is a sequence of images? I've looked through the documentation and the issues here, but couldn't find anything. I also thought long and hard about how to solve this based on what I've read, but the input-side of Dali feels rather inflexible and is steeped in C++ so I can't easily takefn.readers.sequence
and modify it (for example). The best solution I can think of at the moment is decoding the JPEGs on the CPU as part of the ExternalInputIterator, but I'd prefer to do this on the GPU as part of the pipeline. (EDIT: this doesn't actually work anyway sincewarp_affine
doesn't support applying per-frame warps to video sequences, see https://github.com/NVIDIA/DALI/issues/2832)EDIT: I've also tried padding the encoded JPEG data so that I can create one big numpy array from all of the frames in the clip, but it seems that
dali.fn.decoders.image
does not recognise the multiple images and only decodes the first.