espressif / esp-adf

Espressif Audio Development Framework
Other
1.54k stars 679 forks source link

Can ASR record the voice of the saved keywords to the SD card or server while recognizing keywords? (AUD-2616) #542

Closed hulife closed 2 years ago

hulife commented 3 years ago

您好!我有以下几个问题想要进行讨教> ASR识别关键词的同时能否将保存关键词的语音记录到SD卡或者服务器? 若不能,能否先将语音数据进行保存到服务器/sd之后再进行关键词识别? 或者创建两个task,一个一直录音,一个进行识别,识别后停止录音,将录音数据保存?(但这样做时我出现了一个问题 如下)

I (8899) CHECK rectime: [3.4] Register all elements to audio pipeline I (8899) CHECK rectime: [3.5] Link it together [codec_chip]-->i2s_stream-->wav_encoder-->fatfs_stream-->[sdcard] I (8919) CHECK rectime: [3.6] Set up uri (file as fatfs_stream, wav as wav encoder) I (8919) CHECK rectime: [ 4 ] Set up event listener W (8929) I2S: I2S driver already installed #######################################################################3 I (8939) CHECK rectime: [4.1] Listening event from pipeline I (8939) CHECK rectime: [4.2] Listening event from peripherals I (8949) CHECK rectime: [ 5 ] Start audio_pipeline I (8959) CHECK rectime: [ 6 ] Listen for all pipeline events, record for 10 Seconds Guru Meditation Error: Core 1 panic'ed (StoreProhibited). Exception was unhandled. Core 1 register dump: PC : 0x4017d98a PS : 0x00060f30 A0 : 0x8017e383 A1 : 0x3ffd4e90 0x4017d98a: csf_framesig at /home/wangwangwang/workspace/AI/LVCSR/esp_sr_lib_master/components/c_speech_features/c_speech_features/c_speech_features.c:1107

A2 : 0x3fffde90 A3 : 0x000002d0 A4 : 0x000001e0 A5 : 0x00000000 A6 : 0x00000000 A7 : 0x00000000 A8 : 0x00000000 A9 : 0x00000000 A10 : 0x00000000 A11 : 0x3fffe9d8 A12 : 0x00000000 A13 : 0x3fffde90 A14 : 0x00000780 A15 : 0x00000000 SAR : 0x00000000 EXCCAUSE: 0x0000001d EXCVADDR: 0x00000000 LBEG : 0x4017d941 LEND : 0x4017d947 LCOUNT : 0x00000000 0x4017d941: csf_framesig at /home/wangwangwang/workspace/AI/LVCSR/esp_sr_lib_master/components/c_speech_features/c_speech_features/c_speech_features.c:1107

0x4017d947: csf_framesig at /home/wangwangwang/workspace/AI/LVCSR/esp_sr_lib_master/components/c_speech_features/c_speech_features/c_speech_features.c:1107

ELF file SHA256: 65c5babc766d96261e630e80fe2945ac10978aaa2871da291b7a6c25921f7bba

Backtrace: 0x4017d98a:0x3ffd4e90 0x4017e380:0x3ffd4ed0 0x4017e635:0x3ffd4f40 0x40140553:0x3ffd4f90 0x4013fbf9:0x3ffd4ff0 0x400d40ad:0x3ffd5010 0x4008a49a:0x3ffd5120 0x4017d98a: csf_framesig at /home/wangwangwang/workspace/AI/LVCSR/esp_sr_lib_master/components/c_speech_features/c_speech_features/c_speech_features.c:1107

0x4017e380: csf_fbank_HTK_step at /home/wangwangwang/workspace/AI/LVCSR/esp_sr_lib_master/components/c_speech_features/c_speech_features/c_speech_features.c:555

0x4017e635: csf_logfbank_HTK_step at /home/wangwangwang/workspace/AI/LVCSR/esp_sr_lib_master/components/c_speech_features/c_speech_features/c_speech_features.c:696

0x40140553: run_logfbank at /home/sunxiangyu/esp/esp_sr_lib/components/wakenet/mfcc_runner_straight.c:72

0x4013fbf9: model_detect at /home/sunxiangyu/esp/esp_sr_lib/components/wakenet/wakenet6_quantized.c:415

0x400d40ad: asr_main at G:\esp-adf\pipeline_raw_http111\build/../main/record_raw_http.c:1635 (discriminator 9)

0x4008a49a: vPortTaskWrapper at G:/esp-idf/components/freertos/port.c:403

录音任务不能被正确执行

[Thanks!]

HengYongChao commented 3 years ago

@hulife 其实你可以在语音被识别后,保存 buffer 中的数据,可以参考下面标注的位置。

    while (1) {
        raw_stream_read(raw_read, (char *)buffer, size * sizeof(short));
        if (enable_wn) {
            if (wakenet->detect(model_wn_data, (int16_t *)buffer) ==  WAKE_UP) {
                esp_audio_sync_play(player, "file://sdcard/dingdong.mp3", 0);
                ESP_LOGI(TAG, "wake up");
                /* add save buffer to sdcard code */
               >>  fwrite(buffer, sizeof(char), size * sizeof(short), fp);
                /* Above saved the wake-up word, which is 'Hi lexin' usually*/
                enable_wn = false;
            }
        } else {
            mn_count++;
            int commit_id = multinet->detect(model_mn_data, buffer);
            if (asr_multinet_control(commit_id) == ESP_OK ) {
                esp_audio_sync_play(player, "file://sdcard/haode.mp3", 0);
                /* add save buffer to sdcard code */
               >>  fwrite(buffer, sizeof(char), size * sizeof(short), fp);
                /* Above saved the command word, which is 'da kai dian deng' something*/
                enable_wn = true;
                mn_count = 0;
            }
            if (mn_count == mn_num) {
                ESP_LOGI(TAG, "stop multinet");
                enable_wn = true;
                mn_count = 0;
            }
        }
    }
hulife commented 3 years ago

您好!我成功将此数据上传到了服务器,请问如果想要播放此音频,是否需要将此数据编码成wav,或MP3,等数据格式?

HengYongChao commented 3 years ago

这个数据应该是 raw 形式的直接从 i2s 读取的, 当然你也可以直接输出到 i2s 播放。

实际上编码后的文件,如果你要再次播放, 其实是经过解码后的 raw 数据,丢给 i2s 输出的。

hulife commented 3 years ago

那这里现在想要实现的是想要将这个raw数据转换为WAV格式,然后tcp上传服务器。我参考了ELEMENT_REC_WAV_AMR_SDCARD这个例程,发现raw流转wav,用的为ringbuff,请问此例raw数据数据转换是否有专用的接口呢?

HengYongChao commented 3 years ago

取决与具体的操作:

  1. 在本地转化为 wav 文件, 然后上传。(需要参考 pipeline_wav_sdcard)
  2. 上传 raw 数据到服务器后, 在服务器端转化为 wav 文件。(参考 pipeline_raw_http)
hulife commented 3 years ago

1.pipeline_wav_sdcard例程使用的为接口函数而我要将raw 这个buffer并没办法传入,而我又看了ELEMENT_REC_WAV_AMR_SDCARD,这里边使用的是ringbuffer,我不知道该如何转化 2.我尝试用tcp将raw音频直接上传至服务器,可是在服务器端python编码后进行播放结果数据只有1k,且音频没有声音,(我想保存为wav到sd进行验证,但方法1又局限了我)而且我发现在板子端不管我说的是打开空调还是关闭初始模式亦或是HI 乐鑫,buffer永远为960,buffer不应该是随着音频数据而改变大小的吗? 3.现在我想做的是将raw,编码为wav,但不知道如何调用接口!

HengYongChao commented 3 years ago

OK, 看来核心问题是你不知道如何把 raw 数据保存为 wav。 其实我上面提到的例程(pipeline_wav_sdcard)就是做这件事情的, ADF 是基于 stream 的方式来处理数据的, 在这个例程中数据的流向 `[codec_chip]-->i2s_stream-->wav_encoder-->fatfs_stream-->[sdcard]` 这样的。 处理 raw -> wav 的过程是在 wav encoder 这个步骤来实现的, 详细代码是在这里: https://github.com/espressif/esp-adf-libs/blob/19ec8b191ee5a8d69011567312df2b20853a54f9/esp_codec/wav_encoder.c 其实 wav 文件就是 wav 头加上 raw 数据两部分之和。

hulife commented 3 years ago

您好,我通过给文件添加头的方式把文件成功转为wav,但是仍然有那个问题这个音频数据只有960字节也就是audio_mn_chunksize 或者是audio_wn_chunksize的2倍,这么短的采样数据根本听不出来任何音频信息,但我如果将采样数据变大之后的确有了声音,但又不能够进行语音识别比对了。

hulife commented 3 years ago

` / raw_stream_read(raw_read, (char )buffer,size sizeof(short));//size sizeof(short) if (enable_wn) { if (wakenet->detect(model_wn_data, (int16_t )buffer) == WAKE_UP) { // esp_audio_sync_play(player, "file://sdcard/111.mp3", 0); ESP_LOGI(TAG, "wake up");
char
buff0 = (char )malloc(48 1024); if (NULL == buff0) { ESP_LOGE(TAG, "Memory allocation failed!"); return; } memset(buff0, 0, 48 * 1024);

    ESP_LOGI(TAG, "have key");
    for(size_t i = 0; i < 6; i++)
    {
        raw_stream_read(raw_read, (char *)buff0 + i * 8 * 1024, 8 * 1024);
    }

FILE f1 = fopen("/sdcard/hi.pcm", "wb"); if(f1==NULL) { printf("文件 没有找到!\n"); } fwrite(buff0, sizeof(char), 48 1024, f1); fclose(f1); free(buff0); buff0 = NULL; printf("pcm_okk!\n"); pcm_to_wav(pcm,wav); tcp_send(); printf("tcp_okk!\n");

            enable_wn = false;
        }
    } else {
        mn_count++;
        int commit_id = multinet->detect(model_mn_data, buffer);

        if (asr_multinet_control(commit_id) == ESP_OK ) {
        //esp_audio_sync_play(player, "file://sdcard/222.mp3", 0);
        printf("audio_mn_chunksize=%d\n",audio_mn_chunksize);
        led_on();  

`

hulife commented 3 years ago

代码实现了唤醒后录1.5s上传,然后识别。而我理想的效果是唤醒后识别同时并把唤醒词语音上传。

jason-mao commented 2 years ago

关闭这个不活跃问题,如有需要,请重新打开或者创建新 issue。