MycroftAI / mimic-recording-studio

Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Apache License 2.0
493 stars 111 forks source link

There was an error saving that audio. #110

Open MistakingManx opened 3 months ago

MistakingManx commented 3 months ago

Describe the bug Occasionally, a prompt includes some strange character or \u200b. This results in the error "There was an error in saving that audio", and on the backend;

'charmap' codec can't encode character '\u200b' in position 189: character maps to <undefined>
127.0.0.1 - - [14/Apr/2024 00:51:59] "POST /api/audio/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434&prompt=As%20Secretary%20of%20the%20Interior,%20I%20am%20responsible%20for%20the%20education%20of%20forty%20eight%20thousand%20native%20children%20in%20the%20Bureau%20of%20Indian%20Education%20school%20system​. HTTP/1.1" 200 -

To Reproduce Steps to reproduce the behavior:

  1. Launch the development server.
  2. Proceed with recording.
  3. After lots of recordings go by, eventually it'll happen.

Expected behavior A clear and concise description of what you expected to happen.

Log files

127.0.0.1 - - [14/Apr/2024 00:51:47] "GET /api/user/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434 HTTP/1.1" 200 -
127.0.0.1 - - [14/Apr/2024 00:51:48] "GET /api/prompt/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434 HTTP/1.1" 200 -
127.0.0.1 - - [14/Apr/2024 00:51:59] "OPTIONS /api/audio/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434&get_len=True HTTP/1.1" 200 -
127.0.0.1 - - [14/Apr/2024 00:51:59] "OPTIONS /api/audio/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434&prompt=As%20Secretary%20of%20the%20Interior,%20I%20am%20responsible%20for%20the%20education%20of%20forty%20eight%20thousand%20native%20children%20in%20the%20Bureau%20of%20Indian%20Education%20school%20system​. HTTP/1.1" 200 -
ffmpeg version 7.0-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 13.2.0 (Rev5, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      59.  8.100 / 59.  8.100
  libavcodec     61.  3.100 / 61.  3.100
  libavformat    61.  1.100 / 61.  1.100
  libavdevice    61.  1.100 / 61.  1.100
  libavfilter    10.  1.100 / 10.  1.100
  libswscale      8.  1.100 /  8.  1.100
  libswresample   5.  1.100 /  5.  1.100
  libpostproc    58.  1.100 / 58.  1.100
Input #0, matroska,webm, from 'D:\AI\Testing\TTS\mimic-recording-studio\backend\app\../tmp/12961296786636334026.webm':
  Metadata:
    encoder         : Opera
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Stream mapping:
  Stream #0:0 -> #0:0 (opus (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'D:\AI\Testing\TTS\mimic-recording-studio\backend\app\../tmp/12961296786636334026.wav':
  Metadata:
    ISFT            : Lavf61.1.100
  Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
      Metadata:
        encoder         : Lavc61.3.100 pcm_s16le
[out#0/wav @ 00000142a79f8100] video:0KiB audio:1840KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.004140%
size=    1840KiB time=00:00:10.67 bitrate=1411.3kbits/s speed= 366x
127.0.0.1 - - [14/Apr/2024 00:51:59] "POST /api/audio/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434&get_len=True HTTP/1.1" 200 -
ffmpeg version 7.0-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 13.2.0 (Rev5, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      59.  8.100 / 59.  8.100
  libavcodec     61.  3.100 / 61.  3.100
  libavformat    61.  1.100 / 61.  1.100
  libavdevice    61.  1.100 / 61.  1.100
  libavfilter    10.  1.100 / 10.  1.100
  libswscale      8.  1.100 /  8.  1.100
  libswresample   5.  1.100 /  5.  1.100
  libpostproc    58.  1.100 / 58.  1.100
Input #0, matroska,webm, from 'D:\AI\Testing\TTS\mimic-recording-studio\backend\app\../audio_files/1dff5d48-e436-3e71-2763-8912af8c4434\51d6dec844a27af7aa33742bc05706e6.webm':
  Metadata:
    encoder         : Opera
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Stream mapping:
  Stream #0:0 -> #0:0 (opus (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'D:\AI\Testing\TTS\mimic-recording-studio\backend\app\../audio_files/1dff5d48-e436-3e71-2763-8912af8c4434\51d6dec844a27af7aa33742bc05706e6.wav':
  Metadata:
    ISFT            : Lavf61.1.100
  Stream #0:0(eng): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s (default)
      Metadata:
        encoder         : Lavc61.3.100 pcm_s16le
[out#0/wav @ 000002a1950f8340] video:0KiB audio:1840KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: 0.004140%
size=    1840KiB time=00:00:10.67 bitrate=1411.3kbits/s speed= 370x
'charmap' codec can't encode character '\u200b' in position 189: character maps to <undefined>
127.0.0.1 - - [14/Apr/2024 00:51:59] "POST /api/audio/?uuid=1dff5d48-e436-3e71-2763-8912af8c4434&prompt=As%20Secretary%20of%20the%20Interior,%20I%20am%20responsible%20for%20the%20education%20of%20forty%20eight%20thousand%20native%20children%20in%20the%20Bureau%20of%20Indian%20Education%20school%20system​. HTTP/1.1" 200 -

Environment (please complete the following information):

Additional context This is running on a clean development built, as it only says it's not "optimized", and doesn't say "partially non-functional".

MistakingManx commented 3 months ago

I solved this issue by modifying frontend/src/App/api/index.js and modifying postAudio to look like so;

export const postAudio = (audio, prompt, uuid) => {
    function cleanString(input) {
        var output = "";
        for (var i=0; i<input.length; i++) {
            if (input.charCodeAt(i) <= 127) {
                output += input.charAt(i);
            }
        }
        return output;
    }
    return fetch(apiRoot + `api/audio/?uuid=${uuid}&prompt=${cleanString(prompt)}`, {
        method: "POST",
        body: audio,
        headers: {
            "Content-Type": "audio/wav"
        }
    })
};

However, I worry that this may invalidate my dataset since it's modifying the prompt, which I assume is used for saving.