Closed jinevening closed 4 years ago
Format of representative dataset
We use hdf5 file (.h5) to save representative dataset. Hierarchy of the dataset file is as below.
Group "/"
ㄴGroup "value"
ㄴGroup <record_idx>
ㄴDataset <input_idx>
record_idx
(rid) : index of the record (dataset file contains multiple records)
input_idx
(iid) : index of the input (DNN model can have multiple inputs)
Example: 10 records, DNN model requires two inputs, each input has shape (3, 4)
Group "/"
ㄴGroup "value"
ㄴGroup "0"
ㄴDataset "0" Shape (3,4)
ㄴDataset "1" Shape (3,4)
ㄴGroup "1"
ㄴDataset "0" Shape (3,4)
ㄴDataset "1" Shape (3,4)
...
ㄴGroup "10"
ㄴDataset "0" Shape (3,4)
ㄴDataset "1" Shape (3,4)
Once determined, the file format is very difficult to change. Please leave a comment if anyone has questions or suggestions.
I suggest to store raw floats in binary files (not using hdf5), one tensor per file. The same way it is done in luci-value-test. There is no need to store shapes.
For me, using hdf5 or any other "smart" format looks like just an additional layer between the user and the compiler, which does not add to UX and complicates overall design without any benefits.
complicates overall design without any benefits.
IMHO, using a "smart" format has some some benefits to deal with the diversity of the input files.
For example, hdf5 can express endianness of file datatype. (big endian for IEEE_F32BE, litten endian for IEEE_F32LE). I'm not sure we can handle the input file written with different endianness if we use the raw data format.
Also, users can mistakenly give an input which has a wrong shape but the same size expected by the model. For instance, the model requires a (1,3,4)-shaped input, but the user can accidentally give a (1,4,3)-shaped input. It will be very difficult to debug this case if we use the raw data format.
The same case may happen when the user gives an input with a wrong type but with the same size (int32 vs float32).
luci_value_test
is a test conducted in a controlled environment, so it would be ok to ignore such miscellaneous things. But record-minmax
will receive inputs from users, so it needs to be capable of handling various input files.
@jinevening What do you mean by diversity of input files? The user won't pass jpg or png into a model. The model expects floats or integers, not pictures. There are many ways to convert an image to floats. For once, you can just divide each channel by 255. But usually some normalization is performed. For instance, keras suggests to use per_image_standardization
layer, which does the following:
result = (x - mean) / adjusted_stddev
, where adjusted_stddev = max(stddev, 1.0/sqrt(N))
.
In general, how to convert an image to floats/integers is model-dependent, the formula above is just one of possible ways. We do not know what the developer of the model had in mind when he was creating the model.
Let go image classification models. What about NLP models? They also do not accept text. Instead, they consume integers, and only the developer of the model knows how to convert text to integers for a concrete model.
Same for audio processing. You are not going to convert .wav/.mp3/*.ogg and whole lot of other formats into hdf5, are you? 😄 And... how, anyway?
My point is:
The user would already have data correctly converted to the format the the model expects, and the only thing they would need to do is to pass the data to record-minmax
tool. We only need to provide a convenient interface.
I agree that the user may mistakenly pass wrong data, which will be accepted by record-minmax
without error. I don't mind using hdf5 to pass already prepared data to record-minmax
tool. I emphasize that it is the user responsibility to prepare the data (e.g. convert image to floats, or text to integers). This prepared data can then be converted to hdf5 format. But please do not ask the user to deal with hdf5
interface, let's provide more convenient one.
We can still provide some helper utilities to do popular conversions, like the kerases mentioned above.
I don't mind using hdf5 to pass already prepared data to record-minmax tool. I emphasize that it is the user responsibility to prepare the data (e.g. convert image to floats, or text to integers).
I entirely agree that users are responsible for pre-processing their data. We are not going to provide tools to pre-process jpg, png, or audio files (we can provide some popular ones as you mentioned, but basically users should do that). The above hdf5 format is just for saving "already prepared" image data.
The basic functionality of record-minmax (except test) was implemented. So I ran a test with ResNet50 model.
Test process
Convert ResNet50.tflite to ResNet50.circle (using tflite2circle)
Generate 1,000 random input data (using gen_h5_inputs.py)
Run record-minmax with the generated data
time build/release/compiler/record-minmax/record-minmax ResNet50.circle ResNet50.input.h5 ResNet50.minmax_recorded.circle
Recording 0'th data
Recording 100'th data
Recording 200'th data
Recording 300'th data
Recording 400'th data
Recording 500'th data
Recording 600'th data
Recording 700'th data
Recording 800'th data
Recording 900'th data
real 3m59.761s user 3m57.615s sys 0m0.400s
ResNet50.circle: 102170664 bytes
ResNet50.minmax_recorded.circle (embeded with min/max values) : 102171700 bytes (1036 bytes increased)
4. Dump the circle model embedded with min/max
... Operands: T(subgraph index : tensor index) TYPE (shape) (shape_signature) B(buffer index) OperandName T(0:0) FLOAT32 (1, 224, 224, 3) B(1) input_1 Quantization: min(3.40922e-06) max(0.999995) T(0:1) INT32 (4, 2) B(2) resnet_v1_50/Pad/Pad/paddings T(0:2) FLOAT32 (1, 230, 230, 3) B(3) resnet_v1_50/Pad/Pad Quantization: min(0) max(0.999995) T(0:3) FLOAT32 (64, 7, 7, 3) B(4) resnet_v1_50/conv1/BatchNorm/mul_1 T(0:4) FLOAT32 (64) B(5) activation/Relu;resnet_v1_50/conv1/BatchNorm/add_1;resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/sub T(0:5) FLOAT32 (1, 112, 112, 64) B(6) activation/Relu;resnet_v1_50/conv1/BatchNorm/add_1;resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/mul_1;resnet_v1_50/conv1/BatchNorm/sub1 Quantization: min(0) max(6.32373) T(0:6) FLOAT32 (1, 56, 56, 64) B(7) max_pooling2d/MaxPool Quantization: min(0) max(6.32373) ...
We can see the min/max values are embedded in quantization parameters (only for activation tensors).
**Summary**
`record-minmax` took ~4 minutes for profiling 1,000 data with ResNet50. I've checked the recorded values are embedded in the circle model with my eyes.
**Update 2020/06/26**: I've profiled with InceptionV3 and MobileNetV2. They took 6m40s and 1m, respectively.
The next step is to verify the values of the embedded min/max. #2088
I close this issue because the tool was successfully merged and the test issue is being discussed in #2088.
This issue tracks the status of
record-minmax
#1537, a tool to embed min/max values of activation tensors to the circle model, for post-training quantization #696 .What record-minmax does is described below (copied from #696).
This issue tracks the following items.
1. Pre-processing of input data
2. Driver to run luci-interpreter and record min/max values (record-minmax)
Any suggestions and comments are welcome.