Voice volume is low after converting single channel to two channels using rsp_filter (AUD-4967)

mike-2020 commented 8 months ago

Voice is generated by TTS stream. It should be 1 channel, rate 16000.

My I2S interface needs to be configured I2S_CHANNEL_FMT_RIGHT_LEFT. So, I use rsp_filter to do the convert:

    rsp_filter_cfg_t rsp_cfg = DEFAULT_RESAMPLE_FILTER_CONFIG();
    rsp_cfg.src_rate = 16000;
    rsp_cfg.src_ch = 1;
    rsp_cfg.dest_rate = 16000;
    rsp_cfg.dest_ch = 2;
    rsp_cfg.type = ESP_RESAMPLE_TYPE_BYPASS;
    filter = rsp_filter_init(&rsp_cfg);

    audio_pipeline_register(pipeline, tts_stream_reader, "tts");
    audio_pipeline_register(pipeline, filter, "filter");
    audio_pipeline_register(pipeline, i2s_stream_writer, "i2s");

Before adding the filter, voice volume is good as expected. But after adding the filter (without changing any other parameters), output volume is much lower than before.

Hardware is ESP32S3 + ES8388 + NS4150B.

Can you help me investigate this issue?

mike-2020 commented 8 months ago

@jason-mao Any comments on this issue?

jason-mao commented 8 months ago

@mike-2020 Yes, I did some self-examination but did not find more helpful information. Would you provide some pictures or voice files？

mike-2020 commented 8 months ago

@jason-mao

Please find booting logs below.

I tried to save the voice to file using pipeline tts_stream-->filter-->fatfs. but it does not work. The voice file can be generated, but it does not contain any speech. Do you have an example about saving tts output to file on fatfs?

I (0) cpu_start: App cpu up. I (643) esp_psram: SPI SRAM memory test OK I (652) cpu_start: Pro cpu start user code I (657) cpu_start: cpu freq: 240000000 Hz I (657) cpu_start: Application information: I (660) cpu_start: Project name: SmartVehicle-ESP32S3 I (666) cpu_start: App version: 1.0.O I (671) cpu_start: Compile time: Oct 28 2023 20:09:12 I (677) cpu_start: ELF file SHA256: 9c748e4f7a3b0457... I (683) cpu_start: ESP-IDF: v5.1.1-dirty I (688) cpu_start: Min chip rev: v0.0 I (693) cpu_start: Max chip rev: v0.99 I (698) cpu_start: Chip rev: v0.2 I (703) heap_init: Initializing. RAM available for dynamic allocation: I (710) heap_init: At 3FC9F890 len 00049E80 (295 KiB): DRAM I (716) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM I (723) heap_init: At 600FE010 len 00001FD8 (7 KiB): RTCRAM I (729) esp_psram: Adding pool of 7808K of PSRAM memory to heap allocator I (737) spi_flash: detected chip: winbond I (741) spi_flash: flash io: qio W (745) i2s(legacy): legacy i2s driver is deprecated, please migrate to use driver/i2s_std.h, driver/i2s_pdm.h or driver/i2s_tdm.h I (758) sleep: Configure to isolate all GPIO pins in sleep state I (764) sleep: Enable automatic switching of GPIO sleep configuration I (772) app_start: Starting scheduler on CPU0 I (777) app_start: Starting scheduler on CPU1 I (777) main_task: Started on CPU0 I (787) esp_psram: Reserving pool of 32K of internal memory for DMA/internal allocations I (797) main_task: Calling app_main() I (807) AUDIO_MGR: [1.0] Init Peripheral Set I (807) AUDIO_MGR: [2.0] Start codec chip I (817) ES8388_DRIVER: ----- ES8388 settings ----- I (817) gpio: GPIO[17]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 I (827) ES8388_DRIVER: init,out:00, in:03 I (837) AUDIO_HAL: Codec mode is 3, Ctrl:1 W (837) ES8388_DRIVER: es8388_start default is mode:3 I (837) TTS_MAIN: [3.0] Create audio pipeline for playback I (847) TTS_MAIN: [3.1] Create tts stream to read data from chinese strings init voice set:template ESP Chinese TTS v1.7 (Sep 22 2022 14:35:13, 1) I (857) TTS_MAIN: [3.2] Create i2s stream to write data to codec chip I (867) TTS_MAIN: [3.4] Register all elements to audio pipeline I (877) TTS_MAIN: [3.5] Link it together [strings]-->tts_stream-->filter-->i2s_stream-->[codec_chip] I (887) AUDIO_PIPELINE: link el->rb, el:0x3c062778, tag:tts, rb:0x3c0631dc I (887) AUDIO_PIPELINE: link el->rb, el:0x3c062bf0, tag:filter, rb:0x3c065240 I (897) TTS_MAIN: [3.6] Set up uri (tts as tts_stream, and directly output is i2s) I (907) TTS_MAIN: [4.0] Set up event listener I (907) TTS_MAIN: [4.1] Listening event from all elements of pipeline I (917) AUDIO_THREAD: The tts task allocate stack on internal memory I (927) AUDIO_ELEMENT: [tts-0x3c062778] Element task created I (927) AUDIO_THREAD: The filter task allocate stack on external memory I (937) AUDIO_ELEMENT: [filter-0x3c062bf0] Element task created I (947) AUDIO_THREAD: The i2s task allocate stack on internal memory I (957) AUDIO_ELEMENT: [i2s-0x3c062a48] Element task created I (957) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8213036 Bytes, Inter:277271 Bytes, Dram:277271 Bytes

I (967) AUDIO_ELEMENT: [tts] AEL_MSG_CMD_RESUME,state:1 I (977) tts_parser: unicode:0x6b22 -> huan1 I (977) tts_parser: unicode:0x8fce -> ying2 I (987) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 I (997) RSP_FILTER: sample rate of source data : 16000, channel of source data : 1, sample rate of destination data : 16000, channel of destination data : 2 I (1007) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 I (1017) I2S_STREAM: AUDIO_STREAM_WRITER I (1017) AUDIO_PIPELINE: Pipeline started I (1027) TTS_MAIN: [6.0] Listen for all pipeline events I (1027) tts_parser: unicode:0x4f7f -> shi3 I (1037) tts_parser: unicode:0x7528 -> yong4 I (1037) tts_parser: unicode:0x4e50 -> le4 I (1047) tts_parser: unicode:0x946b -> xin1 I (1047) tts_parser: unicode:0x8bed -> yu3 I (1057) tts_parser: unicode:0x97f3 -> yin1 I (1057) tts_parser: unicode:0x5f00 -> kai1 I (1067) tts_parser: unicode:0x6e90 -> yuan2 I (1067) tts_parser: unicode:0x6846 -> kuang4 I (1077) tts_parser: unicode:0x67b6 -> jia4 W (1077) TTS_STREAM: 欢迎使用乐鑫语音开源框架 I (1187) APP_MAIN: ESP_WIFI_MODE_STA W (3787) TTS_STREAM: No more data,ret:0 I (3797) AUDIO_ELEMENT: IN-[tts] AEL_IO_DONE,0 I (4077) AUDIO_ELEMENT: IN-[filter] AEL_IO_DONE,-2 I (4147) AUDIO_ELEMENT: IN-[i2s] AEL_IO_DONE,-2 I (4257) TTS_MAIN: [ * ] Start a new tts stream play: 成功完成语音合成模块的初始化。. I (4257) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:8212948 Bytes, Inter:277271 Bytes, Dram:277271 Bytes

I (4267) AUDIO_ELEMENT: [tts] AEL_MSG_CMD_RESUME,state:1 I (4277) tts_parser: unicode:0x6210 -> cheng2 I (4287) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1 I (4287) RSP_FILTER: sample rate of source data : 16000, channel of source data : 1, sample rate of destination data : 16000, channel of destination data : 2 I (4297) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1 I (4307) I2S_STREAM: AUDIO_STREAM_WRITER I (4317) AUDIO_PIPELINE: Pipeline started I (4317) tts_parser: unicode:0x529f -> gong1 I (4327) tts_parser: unicode:0x5b8c -> wan2 I (4327) tts_parser: unicode:0x6210 -> cheng2 I (4337) tts_parser: unicode:0x8bed -> yu3 I (4337) tts_parser: unicode:0x97f3 -> yin1 I (4347) tts_parser: unicode:0x5408 -> he2 I (4347) tts_parser: unicode:0x6210 -> cheng2 I (4357) tts_parser: unicode:0x6a21 -> mo2 I (4357) tts_parser: unicode:0x5757 -> kuai4 I (4367) tts_parser: unicode:0x7684 -> de I (4367) tts_parser: unicode:0x521d -> chu1 I (4377) tts_parser: unicode:0x59cb -> shi3 I (4377) tts_parser: unicode:0x5316 -> hua4 W (4377) TTS_STREAM: 成功完成语音合成模块的初始化。 W (8047) TTS_STREAM: No more data,ret:0 I (8047) AUDIO_ELEMENT: IN-[tts] AEL_IO_DONE,0 I (8327) AUDIO_ELEMENT: IN-[filter] AEL_IO_DONE,-2 I (8387) AUDIO_ELEMENT: IN-[i2s] AEL_IO_DONE,-2

jason-mao commented 8 months ago

@mike-2020 You can refer examples/recorder/pipeline_wav_amr_sdcard/main/pipeline_wav_amr_sdcard.c to save the filter out to a file

mike-2020 commented 8 months ago

REC.zip @jason-mao here is the recorded file. Not found an issue when play it on PC. But the volume is very low when play it on my device.

Here is my I2S setting when play it on my device (ES8388 driver is used):

i2s_stream_cfg_t i2s_cfg = I2S_STREAM_CFG_DEFAULT();
    i2s_cfg.type = AUDIO_STREAM_WRITER;
    i2s_cfg.i2s_config.sample_rate = 16000;
    i2s_cfg.i2s_config.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT; 
    i2s_cfg.i2s_config.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT;
    i2s_stream_writer = i2s_stream_init(&i2s_cfg);

mike-2020 commented 4 days ago

any comment on this issue?

espressif / esp-adf

Voice volume is low after converting single channel to two channels using rsp_filter (AUD-4967) #1097