espressif / esp-tflite-micro

TensorFlow Lite Micro for Espressif Chipsets
Apache License 2.0
374 stars 81 forks source link

Which ESP-IDF version is used to test the micro_speech example here? #52

Closed misterb0407 closed 1 year ago

misterb0407 commented 1 year ago

Is there specific ESP-IDF version we must use to try out this micro_speech example? I tried v5.0.1, it can build but it has poor accuracy to differentiate voice 'yes' or 'no'. I am using ESP32 ESP-EYE board.

vikramdattu commented 1 year ago

@misterb0407 the model used is from official tflite-micro and the ESP32 ESP-EYE board use on board ADC for mic which is not of high quality. Can you check if results are better when you move closer to the mic and speak? It should not actually depend upon IDF version. Personally, I have been using release/v4.4 for my testing.

misterb0407 commented 1 year ago

Thanks @vikramdattu for the response, I did try to speak closer to the mic, but still gave poor result. I don't think the issue is the mic quality though, as when this ESP-EYE shipped FOB, it could detect the wakeup voice 'hi Lexin' well. I would like to try IDF version 'release/v4.4', do you know an effective way to downgragde my version from v5.0.1 to 'release/v4.4'?

misterb0407 commented 1 year ago

Hi @vikramdattu To downgrade to release/v4.4, can I do the following in the ESP-IDF repo?:

  1. $git checkout release/v4.4
  2. $./install.sh esp32
  3. $source ./export.sh
vikramdattu commented 1 year ago

@misterb0407 you're right. You can thus downgrade the IDF to release/v4.4.

I tested the existing example on the ESP-EYE I have and it seems to be working as expected. Below are the logs:

Heard no (203) @11500ms
Heard no (207) @17500ms
Heard no (209) @20100ms
Heard yes (201) @22200ms
Heard yes (206) @24600ms
Heard yes (218) @26200ms
Heard no (204) @28500ms
Heard no (205) @30700ms
Heard yes (210) @34700ms
Heard yes (210) @38000ms
Heard yes (202) @41700ms
Heard yes (204) @45400ms
Heard yes (211) @49200ms
Heard no (207) @50800ms
Heard unknown (207) @52600ms

I can see that if you pronounce yes in certain ways it gets detected(pronounced as /yəs/) and it doesn't in certain ways(/yEs/). Maybe because of the data the model was trained on.

misterb0407 commented 1 year ago

Thanks @misterb0407 , I tried release/v4.4 and still show same result. What I suspect in my case that the I2S is not initialized properly. Do you need to do extra 'coding' to make it work on your ESP-EYE board? Could you give me the snippet of your function i2s_init() please in file audio_provider.cc

vikramdattu commented 1 year ago

Hello @misterb0407 there absolutely is no change in the code. I am running the example out of the box. I just set the target to esp32, (using idf.py set-target esp32), build the example and flash on ESP-EYE board. The board I have is ESP-EYE v2.1. If the board you're using is different, or if you have local changes please check for the any fixes needed.

misterb0407 commented 1 year ago

Thanks @vikramdattu , ok I tried to speak 'yes' repeatedly and it can detect 'yes' better. I think you are right to certain extent this might be related to the quality of the model, I am ok to close this forum for now, thank you for your support.