NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.18k stars 621 forks source link

Assert on "meta" failed: Operator Reader not found or does not expose valid metadata. #5029

Open ri0107pr opened 1 year ago

ri0107pr commented 1 year ago

Describe the question.

I cannot solve this problem. Could you please tell me how to solve this problem?

class DALIPipeline(Pipeline): def init(self, batch_size, num_threads, device_id, external_data, rsz_w, rsz_h): super(DALIPipeline, self).init(batch_size, num_threads, device_id)

    self.external_data = external_data
    self.iterator = iter(external_data)
    self.input = ops.ExternalSource()
    self.input_label = ops.ExternalSource()
    self.decode = ops.decoders.Image(device = "mixed", output_type = types.RGB)
    #self.decode = ops.ImageDecoderRandomCrop(device = "mixed", output_type = types.DALIImageType.RGB)
    self.resize = ops.Resize(device="gpu", resize_x=rsz_w, resize_y=rsz_h, interp_type=types.INTERP_TRIANGULAR)
    self.transpose = ops.Transpose(device='gpu', perm = [2, 0, 1])
    self.cast = ops.Cast(device='gpu', dtype=types.DALIDataType.FLOAT)

def define_graph(self):
    self.jpegs = self.input(name='Reader')
    self.label = self.input_label()
    img = self.decode(self.jpegs)
    img = self.resize(img)
    img = self.cast(img)
    img = self.transpose(img)
    return (img.gpu(), self.label.gpu())

def iter_setup(self):
    try:
        p = self.iterator.next()
    except StopIteration:
        self.iterator = iter(self.external_data)
        p = self.iterator.next()
    img, label = p
    self.feed_input(self.jpegs, img)
    self.feed_input(self.label, label)

pipe = DALIPipeline(batch_size, num_threads=2, device_id=0, external_data=dataloader, rsz_h=rsz_h, rsz_w=rsz_w)
pipe.build()

dali_iter = DALIGenericIterator(pipelines=pipe, 
                                output_map=['input', 'cls'], 
                                reader_name='Reader', 
                                auto_reset=True, 
                                dynamic_shape=False)

error code: Traceback (most recent call last): File "./train_dali.py", line 253, in train_loader, train_k_transform = DALIDataLoader(**data["dataloader"], edge=opt.edge, train=True) File "/home/workspace/EdgeNet/utils/dali_dataloader.py", line 139, in DALIDataLoader dali_iter = DALIGenericIterator(pipelines=pipe, File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/plugin/pytorch.py", line 181, in init _DaliBaseIterator.init(self, File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/plugin/base_iterator.py", line 201, in init self._extract_from_reader_and_validate() File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/plugin/base_iterator.py", line 214, in _extract_from_reader_and_validate readers_meta = [p.reader_meta(self._reader_name) for p in self._pipes] File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/plugin/base_iterator.py", line 214, in readers_meta = [p.reader_meta(self._reader_name) for p in self._pipes] File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 500, in reader_meta return self._pipe.reader_meta(name) RuntimeError: [/opt/dali/dali/python/backend_impl.cc:2004] Assert on "meta" failed: Operator Reader not found or does not expose valid metadata. Stacktrace (38 entries): [frame 0]: /opt/conda/lib/python3.8/site-packages/nvidia/dali/backend_impl.cpython-38-x86_64-linux-gnu.so(+0x6aff6) [0x7f4c69b1fff6] [frame 1]: /opt/conda/lib/python3.8/site-packages/nvidia/dali/backend_impl.cpython-38-x86_64-linux-gnu.so(+0x44f4b) [0x7f4c69af9f4b] [frame 2]: /opt/conda/lib/python3.8/site-packages/nvidia/dali/backend_impl.cpython-38-x86_64-linux-gnu.so(+0xb617d) [0x7f4c69b6b17d] [frame 3]: python3(+0x13c7ae) [0x563e8e00c7ae] [frame 4]: python3(_PyObject_MakeTpCall+0x3bf) [0x563e8e00125f] [frame 5]: python3(+0x166d50) [0x563e8e036d50] [frame 6]: python3(_PyEval_EvalFrameDefault+0x4f81) [0x563e8e0aa9d1] [frame 7]: python3(_PyEval_EvalCodeWithName+0x260) [0x563e8e09c1f0] [frame 8]: python3(_PyFunction_Vectorcall+0x534) [0x563e8e09d754] [frame 9]: python3(_PyEval_EvalFrameDefault+0x4bf) [0x563e8e0a5f0f] [frame 10]: python3(_PyEval_EvalCodeWithName+0x888) [0x563e8e09c818] [frame 11]: python3(_PyFunction_Vectorcall+0x594) [0x563e8e09d7b4] [frame 12]: python3(_PyEval_EvalFrameDefault+0x71a) [0x563e8e0a616a] [frame 13]: python3(_PyEval_EvalCodeWithName+0xd5f) [0x563e8e09ccef] [frame 14]: python3(_PyFunction_Vectorcall+0x594) [0x563e8e09d7b4] [frame 15]: python3(_PyEval_EvalFrameDefault+0x4bf) [0x563e8e0a5f0f] [frame 16]: python3(_PyEval_EvalCodeWithName+0x260) [0x563e8e09c1f0] [frame 17]: python3(_PyFunction_Vectorcall+0x594) [0x563e8e09d7b4] [frame 18]: python3(_PyEval_EvalFrameDefault+0x1517) [0x563e8e0a6f67] [frame 19]: python3(_PyEval_EvalCodeWithName+0x260) [0x563e8e09c1f0] [frame 20]: python3(_PyFunction_Vectorcall+0x594) [0x563e8e09d7b4] [frame 21]: python3(+0x1b945a) [0x563e8e08945a] [frame 22]: python3(_PyObject_MakeTpCall+0x228) [0x563e8e0010c8] [frame 23]: python3(_PyEval_EvalFrameDefault+0x540a) [0x563e8e0aae5a] [frame 24]: python3(_PyEval_EvalCodeWithName+0x260) [0x563e8e09c1f0] [frame 25]: python3(_PyFunction_Vectorcall+0x594) [0x563e8e09d7b4] [frame 26]: python3(PyObject_Call+0x319) [0x563e8e007819] [frame 27]: python3(_PyEval_EvalFrameDefault+0x1dd3) [0x563e8e0a7823] [frame 28]: python3(_PyEval_EvalCodeWithName+0x260) [0x563e8e09c1f0] [frame 29]: python3(PyEval_EvalCode+0x23) [0x563e8e09daa3] [frame 30]: python3(+0x241382) [0x563e8e111382] [frame 31]: python3(+0x252202) [0x563e8e122202] [frame 32]: python3(+0x2553ab) [0x563e8e1253ab] [frame 33]: python3(PyRun_SimpleFileExFlags+0x1bf) [0x563e8e12558f] [frame 34]: python3(Py_RunMain+0x3a9) [0x563e8e125a69] [frame 35]: python3(Py_BytesMain+0x39) [0x563e8e125c69] [frame 36]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f4d58816bf7] [frame 37]: python3(+0x1f7427) [0x563e8e0c7427]

Check for duplicates

JanuszL commented 1 year ago

Hi @ri0107pr,

Thank you for bringing this up. In case of the external source operator there is no notion of number of samples inside it as the callback provided to it is free to run until a StopIteration exception is raised. In that case, please remove reader_name='Reader' from the iterator. The default size set to -1 should guarantee iterator runs until the external source runs out of data.

klecki commented 1 year ago

Hi @ri0107pr the External Source operator is not a "reader", hence the integration via reader_name parameter with DALIGenericIterator doesn't work.

If you use external source and want to make it work with the DALIGenericIterator I advise to use the source parameter of external source instead of the feed_input method. https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.external_source.html

The source can be a callback or iterable, the parameter batch to the external source specifies if you are producing one sample at a time or one batch at a time.

Raising StopIteration from such callback would indicate end of epoch for the DALIGenericIterator. Of course you would need to remove the reader_name parameter as @JanuszL mentioned.

ri0107pr commented 1 year ago

@JanuszL , @klecki Thank you very much.

I will reread "external_source". There is a possibility that I will ask about the error again. Please let me know if it does.