Closed gudovskiy closed 8 months ago
Hi, It is not possible inside the DALI pipeline definition. because:
self.labels = self.input(name="Reader")
operates on symbolic graph definition and self.labels
represents a batch of labels.
If you want to access a particular sample from the output of the DALI pipeline you can always write something like for the CPU output:
pipeline.run()[output_index].at(index)
or if you are using nightly build:
pipeline.run()[output_index].[index]
or add as_cpu
in the middle for the GPU tensors.
@JanuszL thank you. May be it is possible somehow to return sample path+filename in the label and get id from this? For ImageNet, DALI returns class based on the folder structure... may be there is a way to get "full path" label? Or, alternatively, convert ImageNet to lmdb and encode id in the label?
@gudovskiy - what you are proposing is doable. We would be more than happy to accept any PR that would introduce such functionality.
What could be done is to extend the label data type that readers return to uint32 (PR is needed) and assign a unique label to each sample. Then the user can easily map the label returned from the reader to file_name and actual class/label.
Another idea is to add some method that would return SourceInfo
from the pipeline. So far we have been using it only for errors when the decoder cannot handle properly given file, but it would be exposed to the external world as well.
@JanuszL thanks. Meanwhile, I was able to encode 10-bit labels and 21-bit indices into int32 for ImageNet using Caffe LMDB format
@gudovskiy - I'm happy that you managed to get it working.
@gudovskiy could you please share your solution? I am stuck with the same problem.
@DonkeyShot21 sorry, don't have that code anymore
Hi,
You can use the source_info
property of each tensor to get the file name/unique identifier of the data sample.
Hi, in PyTorch it is possible to return sample index from getitem(self, index) like: return (data, label, index). Is there any way to return such index in DALI? At first glance, it doesn't seem to be possible because reader returns only self.jpegs, self.labels = self.input(name="Reader")?