Closed PureHing closed 2 years ago
@Auroragan @yehangyang example.py code:
if __name__ == "__main__":
# Optimize the onnx model
model_path = 'best_tflite.onnx'#"mnist_model_example.onnx"#
optimized_model_path = optimize_fp_model(model_path)
# Calibration
with open('test.pickle', 'rb') as f:#mnist_test_data.pickle
test_images = pickle5.load(f)
# test_images = pickle.load(f)
# test_images = test_images / 255.0
# Prepare the calibration dataset
calib_dataset = test_images#[0:5000:50]
print("calib_dataset shape:",calib_dataset.shape)
pickle_file_path = 'calib.pickle'
model_proto = onnx.load(optimized_model_path)
print('Generating the quantization table:')
calib = Calibrator('int16', 'per-tensor', 'minmax')
calib.set_providers(['CPUExecutionProvider'])
calib.generate_quantization_table(model_proto, calib_dataset, pickle_file_path)
calib.export_coefficient_to_cpp(model_proto, pickle_file_path, 'esp32s3', '.', 'test_best', True)
Hi @PureHing what is the activation function for this layer?
not sure what you think as more friendly, both depthwise_conv2d and conv2d are optimized on S3. you can choose based on your application.
@Auroragan According to the error prompt: IndexError: list index (1) out of range
tools/quantization_tool/examples$ python example.py
calib_dataset shape: (36, 240, 320, 3)
Generating the quantization table:
Traceback (most recent call last):
File "example.py", line 48, in <module>
calib.export_coefficient_to_cpp(model_proto, pickle_file_path, 'esp32s3', '.', 'test_best', True)
File "calibrator.py", line 589, in calibrator.Calibrator.export_coefficient_to_cpp
IndexError: list index (1) out of range
I don't know which list is out of range. All activation is leakyrelu.
yeah ok, it's a bug for exporting leakyrelu activation, will fix it soon
@PureHing Yeah,It‘s right. Sorry that the current version does not support Reshape layer and Transpose layer. and we will update them soon in the next version.
@TiramisuJ Maybe I can cut off the input and output of reshape and transpose layer, because only dimensionality conversion is done here. But how to quickly realize the image channel conversion on esp32s3,eg 320x320x1 to 1x320x320?
Sure,the image channel conversion from 320x320x1 to 1x320x320 doesn’t change the data arangement, you can just change the shape of the Tensor from [320, 320, 1] to [1, 320, 320].
Please wait a minute, we will fix some bugs and push a new version tonight. please use the new version to develop your project.
@PureHing The new version has been pushed. you can try it
@TiramisuJ @Auroragan the newest version didn't have any changes for my model. the all activation of my model is relu now, so if I want to evaluate this model on esp32s3 now, It seems that the reshape layer and transpose layer just can deal with what I have said in here ?
There should be some differences about the value in the exported cpp?
Yes you can cut off the reshape and transpose layer. For the input, you don’t need to do any change, because for our library the input dimensions for conv should be [height, width, channel], so if you have the array of image data for example:
__attribute__((aligned(16))) int16_t test_image[320*320]={...}
Tensor<int16_t> input;
input.set_element((int16_t *)test_image).set_exponent(*).set_shape({320,320,1}).set_auto_free(false);
For the output, the dimensions are also in the same sequence as above
@Auroragan Thanks much. Got it!
You don’t need to specify dst_height because it’s scaling the width and height with an equal ratio
@Auroragan ok. 1. From this demo, the quantization method of the face detect model is int8 quantization. But the input element is uint8,here is how to do the conversion in your demo?
my model is calibrated by using Calibrator('int8', 'per-tensor', 'entropy')
Tensor<int8_t> input;
input.set_element((int8_t *)IMAGE_ELEMENT).set_exponent(-7).set_shape({320,320,1}).set_auto_free(false);
2. FP32 model: if the input image preprocess is img= (img-mean)/std int8 model: how to do the image preprocess
Our face detect model is using 16-bit quantization, don’t need to convert the input type
if you want to use 8-bit quantization, you can normalized your input image first as what you did for your original model and then times 2^-input_exponent to be the int8 input tensor, or you can also simplify this calculation based on the normalization method you used
@Auroragan
ok.
Can you evaluate the time cost on esp32s3 in totally with the model which input is 320x320x1 and 5 times downsample and all operator is convolutional?
forward: 11205758 us
with my model,the max channel only 256.
sorry cannot evaluate the exact time.
forward: 11205758 us
is this the inference time of running your model on S3?
@Auroragan yes. on s3.If I use 16-bit quantization,there is no enough space on my s3 (2m psram). What is the maximum recommended channel if I want to achieve the same speed as your face detect model.
doesn't look correct, your model is not heavy. channel = 256 is not a problem. For our 8-bit face recognition model, the inference time is only 280ms.
What is your configuration for cache size, cache line size?
#
# Cache config
#
# CONFIG_ESP32S3_INSTRUCTION_CACHE_16KB is not set
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_SIZE=0x8000
# CONFIG_ESP32S3_INSTRUCTION_CACHE_4WAYS is not set
CONFIG_ESP32S3_INSTRUCTION_CACHE_8WAYS=y
CONFIG_ESP32S3_ICACHE_ASSOCIATED_WAYS=8
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_32B=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_SIZE=32
CONFIG_ESP32S3_INSTRUCTION_CACHE_WRAP=y
# CONFIG_ESP32S3_DATA_CACHE_16KB is not set
# CONFIG_ESP32S3_DATA_CACHE_32KB is not set
CONFIG_ESP32S3_DATA_CACHE_64KB=y
CONFIG_ESP32S3_DATA_CACHE_SIZE=0x10000
# CONFIG_ESP32S3_DATA_CACHE_4WAYS is not set
CONFIG_ESP32S3_DATA_CACHE_8WAYS=y
CONFIG_ESP32S3_DCACHE_ASSOCIATED_WAYS=8
CONFIG_ESP32S3_DATA_CACHE_LINE_32B=y
CONFIG_ESP32S3_DATA_CACHE_LINE_SIZE=32
CONFIG_ESP32S3_DATA_CACHE_WRAP=y
# end of Cache config
DATA_CACHE_LINE
didn't have 64B in my idf.py menuconfig
.
@Auroragan I trained a new model with input 160x160x1,it cost 3000ms on my esp32s3. In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.
From this demo, the quantization method of the face detect model is int8 quantization. But the input element is uint8,here is how to do the conversion in your demo?
@PureHing Raw image input into face detection model will first be resized to a smaller image according to resize_scale and converted to int16_t in the meantime. You can take this API for reference.
@PureHing
DATA_CACHE_LINE
didn't have 64B in myidf.py menuconfig
.
64B is not ready for this version of ESP-IDF. I think it will support soon.
I have these suspects:
Raw image input into face detection model will first be resized to a smaller image according to resize_scale and converted to int16_t in the meantime.
@yehangyang if so,what's your human face detect model input size? Why is my model so slow
if so,what's your human face detect model input size? Why is my model so slow
model input shape = raw image shape * resize_scale. In the example here, the resize_scale is 0.2. You can run this example compare with its latency.
@yehangyang There is no problem with the example.
model input shape = raw image shape * resize_scale.
Is it because your input is relatively small?
free element:
void call(Tensor
&input)
Is it right?
Is the memory of the concat-related layer released by l2_concat after it is executed?
@PureHing , yes it is.
Then I think the large latency is because your model has large calculations. Convolution has larger calculations than Depthwise Convolution under a similar configuration anyway.
In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.
In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.
is it 3*3 convolution? Let me test it on the work day.
yeah
In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.
is it 3*3 convolution? Let me test it on the work day.
@yehangyang Hi,I'm sorry to bother you,Have you tested it?
@PureHing Sorry, I'm on leave today. I'll test it tomorrow.
@PureHing I've tested the model you provided before, the inference time is similar to your result, I guess you need to redesign the model structure if you want a faster model.
@Auroragan Thanks
That is to use dw conv2d
as much as possible, is the input( 320x320x1 ) so large?
I guess, the size of tensors from the first few layers are like 400-800 kB, it's pretty large
@Auroragan HI
I calculated that the first few layers of the model are only about 300-400ms.
It consumes more than 2,000 ms since the conv2d_7 layer in the model.
they are not conflict. ok it's like this: first, the size of first few layers are relatively large, I don't know if it meets your requirements, but i think 300-400ms is still not fast for running a few layer secondly, it takes long time for that layer because the times of MAC is large, so maybe you can use 1*1 conv instead
basically, if you want to be faster, i think both need to be optimized.
ok, thanks much
@Auroragan Hi, 1.my onnx model input size is bx240x320x3,calib_dataset shape is (36,240,320,3)
log
``` tools/quantization_tool/examples$ python example.py calib_dataset shape: (36, 240, 320, 3) Generating the quantization table: Traceback (most recent call last): File "example.py", line 48, inWhat's the problem?
2.Which is more friendly to esp, dw or conv2d?