Echo problem of speech voice at the peer (AUD-4686)

EdwardRyu commented 1 year ago

Environment

Audio development kit: ESP32-LyraT Audio kit version : v4.3 Module or chip used: ESP32-WROVER-E IDF version: v4.4.4 ADF version: v2.5

ES8311 is being in use instead of ES8388.
define ALGORITHM_STREAM_DEFAULT_MASK (ALGORITHM_STREAM_USE_AEC | ALGORITHM_STREAM_USE_AGC | ALGORITHM_STREAM_USE_NS )
algo_config.rec_linear_factor = 1;
algo_config.ref_linear_factor = 3;
algo_config.input_type = ALGORITHM_STREAM_INPUT_TYPE2;
volume is set to 95.

Problem Description

Echo is heard from the peer when ALGORITHM_STREAM_INPUT_TYPE1 is used.
If I apply for ALGORITHM_STREAM_INPUT_TYPE2 for AEC, the peer user cannot hear my voice.

Test Steps

Make a VoIP call to the peer.
The user at the peer keeps talking.
The user at the peer hears echoing sound of what he talks.

Request

Please provide the guide line for removing echo.

Thanks, Edward

EdwardRyu commented 1 year ago

Could you please provide general guide of reducing/removing echo problems from sw perspective?

Thanks a lot. Edward

ahhfzhang commented 1 year ago

Hi Edward

About ES8311 AEC feature, you can refer to the ESP32_S3_KORVO2L_V1_BOARD to config.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

Thanks for the update.

Unfortunately I already tried the changes included by ESP32_S3_KORVO2L_V1_BOARD feature along with ALGORITHM_STREAM_INPUT_TYPE2, but I didn't see any improvement.

All I tried are ...

Applied the changes of ESP32_S3_KORVO2L_V1_BOARD feature in below files.

av_stream_hal.h audio_player_mp3.c audio_player_int_tone.c av_stream_hal_camera.c

set RECORD_HARDWARE_AEC to true
set 'algo_config.task_core' to 1.
set 'algo_config.swap_ch' to true.
modified 'algo_config.rec_linear_factor' & 'algo_config.ref_linear_factor'
kept 'algo_config.sample_rate' as 8K as default.

However, none of them has improvement.

**The problem with ALGORITHM_STREAM_INPUT_TYPE1 is that the end user hears the echo sound of what he talks.

The problem with ALGORITHM_STREAM_INPUT_TYPE2 is that the end user hardly hears my voice which is given through the mic connected to ES8311. My voice sound is cutting only mechanical sounds are heard by the peer user.**

When looking at the log for the issue of ALGORITHM_STREAM_INPUT_TYPE2, I could see below warning logs from library.

W (95031) AFE_VC: ERROR! rb_out slow!!!

W (95034) AFE_VC: ERROR! rb_out slow!!!

W (95042) AFE_VC: ERROR! rb_out slow!!!

W (95045) AFE_VC: ERROR! rb_out slow!!!

W (95048) AFE_VC: ERROR! rb_out slow!!!

W (95050) AFE_VC: ERROR! rb_out slow!!!

Here I am attaching three logs for your information. I would be appreciated if you provide a support on this with a careful attention, as it is a blocker for our project.

Echo_Issue_after_applying_ALGORITHM_STREAM_INPUT_TYPE2_HW_AEC.txt Echo_Issue_after_applying_ALGORITHM_STREAM_INPUT_TYPE2.txt Echo_Issue_with_ALGORITHM_STREAM_INPUT_TYPE1.txt

Thanks a lot. Edward

ahhfzhang commented 1 year ago

Hi Edward,

ESP32_S3_KORVO2L_V1_BOARD Codec design:

set RECORD_HARDWARE_AEC to true -> because this codec support set internal reference signal to ADC channle here.
ES8311 no need to set 'algo_config.swap_ch' to true, if it's set, then the DAC(ref) signal will be input to the algorithm as a ADC(mic) signal, the AEC will not effect. the mini development board needs swap because its mic signal is on the right channel.
https://github.com/espressif/esp-adf/blob/master/examples/protocols/components/av_stream/av_stream.h#L39 you can set here to debug the algorithm stream input data or output data.

AEC it's not easy, needs system debug. It depends on your hardware design. For example, whether the mic and spk gains are appropriate, whether there is distortion, saturation, etc.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

Thank you for your support all the time.

I did experiment on AEC with several changes as like a screen shot. It looks like no improvement with echo. It seems a bit improved in case of TYPE 1 for the build No. 2 and 3, but it is not much.

Test result :

In terms of TYPE 2 algorithm, it doesn't seem to work at all. No matter how I changed it, it didn't work. The peer couldn't hear the voice either. Only electrical sounds come whenever user talks to the microphone. When I traced it, I could see that audio_element_multi_input() @ algorithm_data_process_for_type2() returns zero all the time for reading reference data no matter HW Reference is used or not. You can find it from the attached screen shots and logs.

Image 1 :

Image 2 :

Log : Type2_HWREF_ADCL_DACR_Reg0x44_0x50.log

Here I have questions. I might miss something, but it would be appreciated if you give me advice and share your epxertise.

1) Which one is recommended for VoIP echo problem of ESP32? TYPE 1 or TYPE 2?

2) How is reference signal fed to AEC algorithm for TYPE 2, if RECORD_HARDWARE_AEC is defined? Or what if RECORD_HARDWARE_AEC is not defined?

3) Why does audio_element_multi_input() return zero all the time, which means no reference data is read/fed to AEC module.

Thanks, Edward

EdwardRyu commented 1 year ago

Hi Michael,

Go back to the initial problem and question.

Type 1 AEC is not as good as expected on our application. I would like to apply Type 2, but it doesn't work. I have enabled following parts of the features.

CONFIG_ESP32_S3_KORVO2L_V1_BOARD
RECORD_HARDWARE_AEC set to true.

Is this right way to enable TYPE 2 algorithm? Why does audio_element_multi_input() return zero all the time?, which means no reference data is read/fed to AEC module.

Thanks, Edward

ahhfzhang commented 1 year ago

Hi Edward,

Recommended to use type1 , base on your build No.1, RECORD_HARDWARE_AEC(HW Reference) should set true, I2S channel must be I2S_CHANNEL_FMT_RIGHT_LEFT,AGC OFF, then you can try again.
RECORD_HARDWARE_AEC defined = type 1, if RECORD_HARDWARE_AEC not defined but also you can use type 1, but requires software to copy playback data to ringbuf, then combine it with mic data in audio_read_cb and send it to algorithm stream, just like the implementation in av_stream.
type 2 need set_multi_input_ringbuf to algo here. but type 2 is currently not recommended on ESP32, the algorithm cannot be handled in time.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

Thank you for your support. I got an improvement with HW reference data and AGC off, but it is not satisfactory yet.

Truth to be told, the requirement of our HW application is quite tricky and strict. We need louder speaker volume and require sensitive microphone as well for our application.

Here is our setting for the HW.

HW PA Gain is set to 7dB
REG0x44 is set to 0x50, which means HW reference data is in use for SW AEC.
REG0x14 (ADC PGA Gain) is set to 0x1A (30dB)
REG0x17 (ADC Volume) is set to 0xBF (0dB)
REG0x32 (DAC Volume) is set to 0xBA (-2.5dB)

It would be appreciated if you could provide your insight and expertise suggesting any alternative approach.

Please understand that this issue is very critical and crucial to our project.

Thanks, Edward

ahhfzhang commented 1 year ago

Hi Edward,

The ADC part is the same as our default value
But if your HW PA Gain is 7dB, then the maximum DAC Volume can only be -7dB

If you think the speaker is not so loud, you can refer to the following operations:

PA power supply use 5V
The speaker structure needs a cavity, choose a speaker with a slightly larger power

And have you saved the algorithm stream input data ？you can check the waveform to see if there is a problem with too much gain.

SCH_ESP32-S3-KORVO-2L_V1.0_20220815.pdf

Best Regards.

EdwardRyu commented 1 year ago

Thank you Michael for your answer.

We are currently using PA power supply 5V.

I got an audio dump after enabling only DEBUG_AEC_INPUT, not DEBUG_AEC_OUTPUT.

Here the dump file is.

aec.zip

The first part is ring tone, and then the last part is the voice of "testing testing one, two, three".

I wonder if the dump is generated correctly or not, because when I enabled it and tested it, voice was cutting unlike when testing without this feature on. It may require more CPU resources for dumping data, am I right?

And when replaying this wav file, it sounds faster than it really is. We might lost some raw data during dump procedure. Don't you think so?

Anyway as I am not familiar with audio issue, so it would be appreciated if you can find any clue from this dump.

Please let me know how to find where the bigger gain is given or where the distortion is happening.

What does the upper channel in the diagram mean? What does the lower channel?

Thanks, Edward

ahhfzhang commented 1 year ago

Hi Edward,

Yes, the dump file seems lost a lot of raw data. Could it be that the writing speed of the sdcard is too slow? But the gain doesn't look distorted based on these data.

The upper channel is mic(ADC) data, lower channel is ref(DAC) data.

Maybe you can try use this mp3 as a ref data.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

Our HW doesn't support SD Card, instead all files are saved into flash memory. I tried algorithm example but it is same.

We are missing many raw data during playing and dump. I came to know that this is a limit of BUS architecture because we stores dump data into flash memory and even mp3 file comes from fat file system, it may be eating up MIPs as well.

Do you have further idea?

What is the similarity of raw data of ch1 and ch2 given on the other day? Is the noise or distortion level at ch1 acceptable?

BTW, I have raised a simple question about audio path in another ticket. It would be appreciated if you also take care of this.

Thanks, Edward

EdwardRyu commented 1 year ago

Hi Michael,

I got a dump from I2S level directly and then generated wav files as attached.

BOARD_PA_GAIN is defined as 6 in board_def.h HW PA Gain is set to 14dB REG0x44 is set to 0x50, which means HW reference data is in use for SW AEC. REG0x14 (ADC PGA Gain) is set to 0x1A (30dB) REG0x17 (ADC Volume) is set to 0xBF (0dB) REG0x32 (DAC Volume) is set to 0xBA (-2.5dB)

audio_dump_i2s.zip

Looking forward to your feedback and analysis.

Thanks, Edward

EdwardRyu commented 1 year ago

Hi Michael,

Here I am attaching new dump from Mic, reference signal and voice from the end user.

We cannot dump audio data from the output of AEC algorithm because lots of audio dump data is missing. Instead I saved it from the android smartphone of the end user, where you can hear the echo clearly. And I would like to correct HW PA Gain. All the test along with ones in the previous email has been tested with HW PA Gain set to 14dB.

record_and_reference.wav : Record data(Mic) and Reference data recorded_from_end_user.m4a : Saved audio from the end user of android smart phone. BOARD_PA_GAIN is defined as 6 in board_def.h HW PA Gain is set to 14dB REG0x44 is set to 0x50, which means HW reference data is in use for SW AEC. REG0x14 (ADC PGA Gain) is set to 0x1A (30dB) REG0x17 (ADC Volume) is set to 0xBF (0dB) REG0x32 (DAC Volume) is set to 0xBA (-2.5dB)

audio_dump_i2s_new.zip

Thanks, Edward

ahhfzhang commented 1 year ago

Hi Edward,

I think your DAC+PA gain is too high, if HW PA Gain is 14dB, then BOARD_PA_GAIN should be 14

and REG0x32 (DAC Volume) should set to -14db or lower.

Another way you can test is if you set REG0x32 (DAC Volume) to 0xBA (-2.5dB), then HW PA Gain can reduced to 2.5dB.

8311_aec_test.zip Here is our test audio for AEC feature.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

I got audio dump using alogrithm_example.c and test.mp3 you provided. Audio sounds is cutting as mentioned before and I2S dump ( I didn't attach it) also shows that mp3 sound is already corrupted.

Anyway I hope you can find meaningful data from the dump data.

This time I have used following values.

BOARD_PA_GAIN set to 14 REG0x32 set to 0xaa (-10.5 dB, volume 100 ) Note) audio volume is still small for our application requirement.

AudioDump.zip

BTW, I would like to do verify that Codec and AEC are working fine on the same test environment as yours. We would like to follow your recommendation?

Please let me know the followings.

Development Board for testing
ADF branch & versions.
example codes.
SW configuration.

Looking forward to seeing your feedback soon.

Thanks, Edward

ahhfzhang commented 1 year ago

Hi Edward,

The audio sounds broken and seems beyond the range of speakers. Have you tried replacing the spkeaker with higher power?

My test environment :

Development Board : ESP32-LyraT-Mini / ESP32-S3-Korvo-2L ADF branch & versions : master example codes : algorithm examples SW configuration: ESP32-LyraT-Mini use defaults config, ESP32-S3-Korvo-2L needs sync the changes of 8311 metioned above.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

I am a bit confused with SW configuration. Could you help on this?

I am supposed to use 'ESP32-LyraT-Mini V1.2' board but I couldn't find SW configuration for this. Instead I could see 'ESP32-Lyrat-Mini V1.1' from menuconfig. But default configuration is 'ESP32-Lyrat V4.3'.

Please let me know which configuration should be used for ESP32-Lyrat-Mini V1.2 board.

Another question is about SD card.

Which class of micro SD card should be used? What memory size of SD card should be selected? Do you have any recommendations for SD card?

Thanks,

ahhfzhang commented 1 year ago

Hi Edward,

Yes,'ESP32-LyraT-Mini V1.2' board should choose 'ESP32-Lyrat-Mini V1.1' for configuration.

About SDcard, you can use Class 10 SDHC Micro SDcard, like SanDisk/Samsung/kingston 4-32G SDHC Micro SDcard. File system selection FAT32.

Best Regards.

EdwardRyu commented 1 year ago

Hi Michael,

I got a LyraT_mini and tested algorithm_example.c as follows.

I connected 4-ohm, 3-watt speaker as attached screen shot. The distance between a microphone and a speaker is about 20 cm.

I only have changed the volume to 95(DAC volume -19dB) and 100(DAC volume -16.5dB) from SW perspective and iterated it for several times. But the sound volume is not satisfactory yet in terms of requirement.

Anyway I got many aec_out.wav and aec_in.wav files which were collected from the different iteration. Please note that we cannot collect both aec_out.wav & aec_in.wav at the same time with this example codes. Please let me know if I am wrong for this.

As you can see, most of the time I could hear and see AEC output including echos at the latter part of the aec_out.wav file.

Could you have a look into the dump files and then get back to me?

Audio dump files have been attached here.

LyraT_mini_AEC_test.zip

We want to improve or remove this echo problem here.

Thanks, Edward

ahhfzhang commented 1 year ago

Hi Edward,

Yes, we cannot collect both aec_out.wav & aec_in.wav at the same time with this example codes. but i think the echo cancellation on the ESP32 works as expected, and of course the ESP32S3 will be even better.

Part of echo doesn't eliminate and the sound volume is not satisfactory, I think you should seal the speaker like this

that would be better for AEC and the volume will also be louder.

Best Regards.

EdwardRyu commented 11 months ago

Hi Michael,

1. We tested even with mechanical sealing as attached snap shots, but the problem is still there. 20230928_100530 20230928_100541

I also got a your test result from the different channel. The result was not satisfactory though.

So I would like to ask Espressif to improve AEC algorithm even with louder speaker volume.

I believe this enhancement makes your product shine.

Thanks, Edward

espressif / esp-adf