espressif / esp-sr

Speech recognition
Other
576 stars 106 forks source link

Need help understanding the library (AIS-1310) #78

Closed jayavanth closed 1 year ago

jayavanth commented 1 year ago

I'm trying to do VAD with esp-sr and I'm having trouble with performance since I'm also running other applications simultaneously (CSI gathering and transmitting). When I run VAD just by itself I have no problems but when I do both, VAD seems less accurate. It detects voice only when the source is very close to the mic.

I have an ESP32-S3 board with two I2S microphones. I don't mind doing VAD on mono if it means I get better performance. This is my mic setting right now

                afe_config.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM;
                afe_config.wakenet_init = true;
                afe_config.wakenet_model_name= wn_name;
                afe_config.se_init= true;
                afe_config.voice_communication_init = false;
                afe_config.vad_init = true;
                afe_config.vad_mode =  VAD_MODE_4;

                afe_config.aec_init = false;
                afe_config.pcm_config.total_ch_num = 2;
                afe_config.pcm_config.mic_num = 1;
                afe_config.pcm_config.ref_num = 1;
  1. What is REF data? I notice that when I enable that channel, I get slightly better VAD
  2. Do different WakeNet models have different VAD performance? I believe WakeNet is not needed for VAD but it's needed currently because of a bug. Wondering if switching to a different WakeNet model will free up some resources for CSI
  3. I noticed that esp_get_feed_data(false, i2s_buff, audio_chunksize * sizeof(int16_t) * feed_channel); takes about a second to return. Is there a way I can reduce that time? The chunk size I get is 160
feizi commented 1 year ago
  1. What is REF data? I notice that when I enable that channel, I get slightly better VAD

In AFE module, REF(reference) data is the data you are playing by I2S. Acutally, it just is needed for AEC algorithm. AEC may reduce some of the noise to make your VAD look better, but I still don't recommend turning it on because it may cause some unknown problems

  1. Do different WakeNet models have different VAD performance? I believe WakeNet is not needed for VAD but it's needed currently because of a bug. Wondering if switching to a different WakeNet model will free up some resources for CSI

Yes, wakenet is not needed for VAD. But there is a bug. When you disable wakenet , VAD can not work correctly. I will fix it today so you can disable wakenet.

  1. I noticed that esp_get_feed_data(false, i2s_buff, audio_chunksize * sizeof(int16_t) * feed_channel); takes about a second to return. Is there a way I can reduce that time? The chunk size I get is 160

I guess the hardware driver initialization will take some time

jayavanth commented 1 year ago

Thanks @feizi