IndexError: list index (1) out of range--------calibrator.Calibrator.export_coefficient_to_cpp (AIV-410)

PureHing commented 2 years ago

@Auroragan Hi, 1.my onnx model input size is bx240x320x3，calib_dataset shape is (36,240,320,3)

Screenshot_select-area_20210908213335

log

``` tools/quantization_tool/examples$ python example.py calib_dataset shape: (36, 240, 320, 3) Generating the quantization table: Traceback (most recent call last): File "example.py", line 48, in calib.export_coefficient_to_cpp(model_proto, pickle_file_path, 'esp32s3', '.', 'test_best', True) File "calibrator.py", line 589, in calibrator.Calibrator.export_coefficient_to_cpp IndexError: list index (1) out of range ```

What's the problem？

2.Which is more friendly to esp, dw or conv2d?

PureHing commented 2 years ago

@Auroragan @yehangyang example.py code:


if __name__ == "__main__":
    # Optimize the onnx model
    model_path = 'best_tflite.onnx'#"mnist_model_example.onnx"#
    optimized_model_path = optimize_fp_model(model_path)

    # Calibration
    with open('test.pickle', 'rb') as f:#mnist_test_data.pickle
        test_images = pickle5.load(f)
        # test_images = pickle.load(f)
    # test_images = test_images / 255.0

    # Prepare the calibration dataset
    calib_dataset = test_images#[0:5000:50]
    print("calib_dataset shape:",calib_dataset.shape)
    pickle_file_path = 'calib.pickle'
    model_proto = onnx.load(optimized_model_path)

    print('Generating the quantization table:')
    calib = Calibrator('int16', 'per-tensor', 'minmax')
    calib.set_providers(['CPUExecutionProvider'])
    calib.generate_quantization_table(model_proto, calib_dataset, pickle_file_path)
    calib.export_coefficient_to_cpp(model_proto, pickle_file_path, 'esp32s3', '.', 'test_best', True)

Auroragan commented 2 years ago

Hi @PureHing what is the activation function for this layer?

not sure what you think as more friendly, both depthwise_conv2d and conv2d are optimized on S3. you can choose based on your application.

PureHing commented 2 years ago

@Auroragan According to the error prompt: IndexError: list index (1) out of range

tools/quantization_tool/examples$ python example.py 
calib_dataset shape: (36, 240, 320, 3)
Generating the quantization table:
Traceback (most recent call last):
  File "example.py", line 48, in <module>
    calib.export_coefficient_to_cpp(model_proto, pickle_file_path, 'esp32s3', '.', 'test_best', True)
  File "calibrator.py", line 589, in calibrator.Calibrator.export_coefficient_to_cpp
IndexError: list index (1) out of range

I don't know which list is out of range. All activation is leakyrelu.

Auroragan commented 2 years ago

yeah ok, it's a bug for exporting leakyrelu activation, will fix it soon

TiramisuJ commented 2 years ago

@PureHing Yeah，It‘s right. Sorry that the current version does not support Reshape layer and Transpose layer. and we will update them soon in the next version.

PureHing commented 2 years ago

@TiramisuJ Maybe I can cut off the input and output of reshape and transpose layer, because only dimensionality conversion is done here. But how to quickly realize the image channel conversion on esp32s3,eg 320x320x1 to 1x320x320?

TiramisuJ commented 2 years ago

Sure，the image channel conversion from 320x320x1 to 1x320x320 doesn’t change the data arangement, you can just change the shape of the Tensor from [320, 320, 1] to [1, 320, 320].
Please wait a minute, we will fix some bugs and push a new version tonight. please use the new version to develop your project.

TiramisuJ commented 2 years ago

@PureHing The new version has been pushed. you can try it

PureHing commented 2 years ago

@TiramisuJ @Auroragan the newest version didn't have any changes for my model. the all activation of my model is relu now, so if I want to evaluate this model on esp32s3 now, It seems that the reshape layer and transpose layer just can deal with what I have said in here ?

Auroragan commented 2 years ago

There should be some differences about the value in the exported cpp?

Yes you can cut off the reshape and transpose layer. For the input, you don’t need to do any change, because for our library the input dimensions for conv should be [height, width, channel], so if you have the array of image data for example:

__attribute__((aligned(16))) int16_t test_image[320*320]={...}

Tensor<int16_t> input;
input.set_element((int16_t *)test_image).set_exponent(*).set_shape({320,320,1}).set_auto_free(false);

For the output, the dimensions are also in the same sequence as above

PureHing commented 2 years ago

@Auroragan Thanks much. Got it!

PureHing commented 2 years ago

@Auroragan BTW, How is the dst_h parameter given here

Auroragan commented 2 years ago

You don’t need to specify dst_height because it’s scaling the width and height with an equal ratio

PureHing commented 2 years ago

@Auroragan ok. 1. From this demo, the quantization method of the face detect model is int8 quantization. But the input element is uint8,here is how to do the conversion in your demo？

my model is calibrated by using Calibrator('int8', 'per-tensor', 'entropy')

    Tensor<int8_t> input;
    input.set_element((int8_t *)IMAGE_ELEMENT).set_exponent(-7).set_shape({320,320,1}).set_auto_free(false);

2. FP32 model: if the input image preprocess is img= (img-mean)/std int8 model: how to do the image preprocess

Auroragan commented 2 years ago

Our face detect model is using 16-bit quantization, don’t need to convert the input type

if you want to use 8-bit quantization, you can normalized your input image first as what you did for your original model and then times 2^-input_exponent to be the int8 input tensor, or you can also simplify this calculation based on the normalization method you used

PureHing commented 2 years ago

@Auroragan ok. Can you evaluate the time cost on esp32s3 in totally with the model which input is 320x320x1 and 5 times downsample and all operator is convolutional? forward: 11205758 us with my model,the max channel only 256.

Auroragan commented 2 years ago

sorry cannot evaluate the exact time.

forward: 11205758 us

is this the inference time of running your model on S3?

PureHing commented 2 years ago

@Auroragan yes. on s3.If I use 16-bit quantization,there is no enough space on my s3 (2m psram). What is the maximum recommended channel if I want to achieve the same speed as your face detect model.

Auroragan commented 2 years ago

doesn't look correct, your model is not heavy. channel = 256 is not a problem. For our 8-bit face recognition model, the inference time is only 280ms.

What is your configuration for cache size, cache line size?

PureHing commented 2 years ago

#
# Cache config
#
# CONFIG_ESP32S3_INSTRUCTION_CACHE_16KB is not set
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_SIZE=0x8000
# CONFIG_ESP32S3_INSTRUCTION_CACHE_4WAYS is not set
CONFIG_ESP32S3_INSTRUCTION_CACHE_8WAYS=y
CONFIG_ESP32S3_ICACHE_ASSOCIATED_WAYS=8
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_32B=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_LINE_SIZE=32
CONFIG_ESP32S3_INSTRUCTION_CACHE_WRAP=y
# CONFIG_ESP32S3_DATA_CACHE_16KB is not set
# CONFIG_ESP32S3_DATA_CACHE_32KB is not set
CONFIG_ESP32S3_DATA_CACHE_64KB=y
CONFIG_ESP32S3_DATA_CACHE_SIZE=0x10000
# CONFIG_ESP32S3_DATA_CACHE_4WAYS is not set
CONFIG_ESP32S3_DATA_CACHE_8WAYS=y
CONFIG_ESP32S3_DCACHE_ASSOCIATED_WAYS=8
CONFIG_ESP32S3_DATA_CACHE_LINE_32B=y
CONFIG_ESP32S3_DATA_CACHE_LINE_SIZE=32
CONFIG_ESP32S3_DATA_CACHE_WRAP=y
# end of Cache config

DATA_CACHE_LINE didn't have 64B in my idf.py menuconfig.

@Auroragan I trained a new model with input 160x160x1,it cost 3000ms on my esp32s3. In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.

yehangyang commented 2 years ago

From this demo, the quantization method of the face detect model is int8 quantization. But the input element is uint8,here is how to do the conversion in your demo？

@PureHing Raw image input into face detection model will first be resized to a smaller image according to resize_scale and converted to int16_t in the meantime. You can take this API for reference.

yehangyang commented 2 years ago

@PureHing

DATA_CACHE_LINE didn't have 64B in my idf.py menuconfig.

64B is not ready for this version of ESP-IDF. I think it will support soon.

I have these suspects:

is CPU frequency is 240MHz?
did you free element in time, like this.

PureHing commented 2 years ago

Raw image input into face detection model will first be resized to a smaller image according to resize_scale and converted to int16_t in the meantime.

@yehangyang if so,what's your human face detect model input size? Why is my model so slow

yehangyang commented 2 years ago

if so,what's your human face detect model input size? Why is my model so slow

model input shape = raw image shape * resize_scale. In the example here, the resize_scale is 0.2. You can run this example compare with its latency.

PureHing commented 2 years ago

@yehangyang There is no problem with the example.

model input shape = raw image shape * resize_scale.

Is it because your input is relatively small?

free element:

void call(Tensor &input)

Is it right?

yehangyang commented 2 years ago

Is the memory of the concat-related layer released by l2_concat after it is executed?

@PureHing , yes it is.

Then I think the large latency is because your model has large calculations. Convolution has larger calculations than Depthwise Convolution under a similar configuration anyway.

PureHing commented 2 years ago

In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.

yehangyang commented 2 years ago

In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.

is it 3*3 convolution? Let me test it on the work day.

PureHing commented 2 years ago

yeah

PureHing commented 2 years ago

In this model:Layer 7---model/conv2d_7/Conv2D: bx64x20x20->bx128x20x20,it cost about 1150ms.

is it 3*3 convolution? Let me test it on the work day.

@yehangyang Hi,I'm sorry to bother you,Have you tested it?

yehangyang commented 2 years ago

@PureHing Sorry, I'm on leave today. I'll test it tomorrow.

Auroragan commented 2 years ago

@PureHing I've tested the model you provided before, the inference time is similar to your result, I guess you need to redesign the model structure if you want a faster model.

PureHing commented 2 years ago

@Auroragan Thanks That is to use dw conv2d as much as possible, is the input( 320x320x1 ) so large?

Auroragan commented 2 years ago

I guess, the size of tensors from the first few layers are like 400-800 kB, it's pretty large

PureHing commented 2 years ago

@Auroragan HI

I calculated that the first few layers of the model are only about 300-400ms.

It consumes more than 2,000 ms since the conv2d_7 layer in the model.

Auroragan commented 2 years ago

they are not conflict. ok it's like this: first, the size of first few layers are relatively large, I don't know if it meets your requirements, but i think 300-400ms is still not fast for running a few layer secondly, it takes long time for that layer because the times of MAC is large, so maybe you can use 1*1 conv instead

basically, if you want to be faster, i think both need to be optimized.

PureHing commented 2 years ago

ok, thanks much

espressif / esp-dl

IndexError: list index (1) out of range--------calibrator.Calibrator.export_coefficient_to_cpp (AIV-410) #59