KhronosGroup / NNEF-Docs

NNEF public repository
Apache License 2.0
14 stars 3 forks source link

Training data in tensor file format? #10

Closed zoeoz closed 5 years ago

zoeoz commented 6 years ago

As we integrate NNEF support into our system, one of the thoughts that has occurred to us is that using the tensor file format to store various kinds of training data may be very convenient.

We understand, from an NNEF perspective, training data is introduced into the graph through the external nodes. So this appears to be the reason it is not part of the NNEF tensor file container or otherwise covered in the specification. With good rationale, as we can imagine the various sources of training data are vast and generally beyond the scope of NNEF, which is to exchange the trained and/or untrained neural network information (i.e., graph structure and learned parameters).

Having said that, the tensor file format does provide a very nice, convenient choice that appears to be suitable in many cases. The only potential issue we run into is the 32-bit size restrictions in the tensor file header information, since a single tensor of training data has the potential to easily be in excess of the 4 GB limit.

We don't suggest introducing training data into the specification beyond the current "official" mechanisms. However, providing 64-bit fields in the tensor file header would allow vendors to use the tensor file format to store libraries of training data, and this could in some situations facilitate standardized exchange between systems, or avoiding "ad hoc" formats to exchange training data even within a single system.

For clarification, in a previous post I had mentioned 64-bit fields and agreed their use would be unlikely for a single tensor. At the time I was considering tensors of learning parameters, and had not yet thought of their potential use to store training data.

gyenesvi commented 6 years ago

The tensor file format is not designed for large amounts of data, and the reason is not the 32 bit fields (of course, that is a limitation too). First, the tensor file is not compressed, so storing lots of data (like images) in arrays where every item is stored explicitly would make the size of the database very large compared to a set of png and jpg images. Second, the data would be laid out in flat array, which could make it difficult to access for training, where batches of data must be read, and shuffling of data is also required. I believe database formats like lmdb may be a better fit for storing training data.

zoeoz commented 5 years ago

This is all true. I did not think of that. Very good rationale. Thanks.