Open matthew-frank opened 2 years ago
Thanks for reporting the issue. Let me check this and get back to you. We will work something out to make this more understandable.
I can't agree more with you. You really speak to my heart. Several months passed and sadly the documentation is still the same as here at present.
To be honest, there're millions of doubts when I read the documentation. I have no idea about what the size
means. Does it
stand for batch size? And what is the epoch size in the documentation? What should I fill in the reader_name
? What does reader_name = "Readers"
means? Where does the "readers" come from? I feel so helpless...
I also agree. Also, while the "Getting started tutorial" helped a lot, I believe it would be much easier for new users to use DALI if an explanation/example of using "reader_name" was additionally given. (as it is now, it feels like to understand how "reader_name" works, I need to piece together information from bits of documentations, which is making understanding very difficult.) (also adding explanation of reader_name on https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/numpy_reader.html might be helpful to!
The
reader_name
argument ofnvidia.dali.plugin.*.DALI*Iterator()
has been difficult for us to understand. I'd like to propose a rewording of the documentation, but want to check that what I'm proposing is actually correct, and ask some questions.The current documentation is:
last_batch_policy
toLastBatchPolicy.FILL
and it seems to work as described in the examples (i.e., it works like "FILL", rather than being overridden to "PARTIAL".)reader_name
is mutually exclusive with using thelast_batch_padded
argument, and instead, if thereader_name
argument is set, thelast_batch_padded
argument is set to the value of thepad_last_batch
option of the reader?reader_name
argument is the "name of the reader which will be queried" but doesn't explain what object will be queried to find the reader. Is it the list of pipelines given by thepipelines
argument? What are the semantics if the reader is not found in any of the pipelines from thepipelines
argument? What are the semantics if the reader is found in more than one of the pipelines from thepipelines
argument?fn.readers.mxnet
constructor apparently takes an argumentname
, but this argument isn't documented in the documentation forfn.readers.mxnet()
, (nor could I find this argument documented in any of the other readers), and the semantics of naming readers in Pipelines doesn't seem to be documented. The only place I can find the reader name mentioned is in the documentation is in the documentation for thereader_meta
argument to Pipeline.size
argument to theDALI*Iterator()
is correctly documented. It says that if thesize
is set to the default (-1), "The options last_batch_policy and last_batch_padded don’t work in such case." but then it also says setting size=-1 is mutually exclusive with the reader_name argument.For reference: the Sharding documentation page has been somewhat helpful in understanding the actual semantics of this argument, but in that case when the
reader_name
argument is described, it should also be more explicit about how it is derivingsize
frompipeline.reader_meta(reader_name)['epoch_size_padded']
(or however it is actually derived) andlast_batch_padded
frompipeline.reader_meta(reader_name)['pad_last_batch']
.