Closed GregoryBetsey closed 3 years ago
@GregoryBetsey It looks like something went wrong when trying to transcribe your audio to build the dataset. Could you firstly check that you used the latest executable Version 0.3 as the second error should have been fixed in that release.
If you did use that or the error still occurs could you upload your audio/text to google drive or email it to me at benandrew89@gmail.com so I can run some analysis
@BenAAndrew Thanks for responding. I will send you a download link to your email address. I did not use the "automatic" audiobook method shown in your Youtube video, rather I transcribed the text manually.
Update: I tried the latest release and got this error: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory.
Server initialized for threading.
Server initialized for threading.
pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
['C:\Users\GREGOR~1\AppData\Local\Temp\_MEI104602\base_library.zip', 'C:\Users\GREGOR~1\AppData\Local\Temp\_MEI104602', 'synthesis/waveglow/', 'C:\Users\Gregory Betsey']
torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
INFO:matplotlib.font_manager:Generating new fontManager, this may take some time...
[nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L
[nltk_data] ocal\Temp_MEI104602\nltk_data...
[nltk_data] Package wordnet is already up-to-date!
INSTALLING FFMPEG
VERIFYING FFMPEG INSTALL
WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.
torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
error logging recieved invalid response
emitting event "error" to all [/voice]
INFO:socketio.server:emitting event "error" to all [/voice]
CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["error",{"type":"RuntimeError","text":"[enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory","stacktrace":"Traceback (most recent call last):\n File \"application\utils.py\", line 63, in background_task\n File \"application\utils.py\", line 39, in create_dataset\n File \"dataset\clip_generator.py\", line 60, in clip_generator\n File \"dataset\forced_alignment\align.py\", line 69, in process_segments\n File \"dataset\transcribe.py\", line 16, in transcribe\n File \"torch\hub.py\", line 370, in load\n File \"torch\hub.py\", line 399, in _load_local\n File \"C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py\", line 24, in silero_stt\n model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit,\n File \"C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py\", line 135, in init_jit_model\n model = torch.jit.load(model_path, map_location=device)\n File \"torch\jit\_serialization.py\", line 161, in load\nRuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory\n"}]
INFO:werkzeug:127.0.0.1 - - [01/Apr/2021 20:28:34] "GET /socket.io/?EIO=4&transport=polling&t=NYGQG4q&sid=CxB55ktHT5jOvFCmAAAA HTTP/1.1" 200 -
INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet MESSAGE data 2/voice,["error",{"type":"RuntimeError","text":"[enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory","stacktrace":"Traceback (most recent call last):\n File \"application\utils.py\", line 63, in background_task\n File \"application\utils.py\", line 39, in create_dataset\n File \"dataset\clip_generator.py\", line 60, in clip_generator\n File \"dataset\forced_alignment\align.py\", line 69, in process_segments\n File \"dataset\transcribe.py\", line 16, in transcribe\n File \"torch\hub.py\", line 370, in load\n File \"torch\hub.py\", line 399, in _load_local\n File \"C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py\", line 24, in silero_stt\n model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit,\n File \"C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py\", line 135, in init_jit_model\n model = torch.jit.load(model_path, map_location=device)\n File \"torch\jit\_serialization.py\", line 161, in load\nRuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory\n"}]
[enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory
CxB55ktHT5jOvFCmAAAA: Sending packet PING data None
INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Sending packet PING data None
CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket
INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket
CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket
INFO:engineio.server:CxB55ktHT5jOvFCmAAAA: Client is gone, closing socket@GregoryBetsey if you look at the folder which contains your .exe, is there a file called latest_silero_models.yml
?
Yes, it does. I ran it through edge this time and got farther than before but got a new error this time: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous.
I'll investigate this and get back to you.
@GregoryBetsey It the latest build 0.4.1 I've added some extra validation to the transcription process which may fix the bug. Could you give it a go?
@GregoryBetsey It the latest build 0.4.1 I've added some extra validation to the transcription process which may fix the bug. Could you give it a go?
Thanks. I tried the latest built today any got stuck on the "generating segments..." section. I will attach the log file. Error 4.4.2021.txt
P.S. I am using the same files I sent to you via google drive.
@GregoryBetsey Thank you for the error log. The issue seems to be with the torchaudio library not being able to change the audio sample rate. I will investigate now
@GregoryBetsey I've removed the code throwing the bug and replaced it with a different library. If you get a minute could you try release 0.5.1?
I gave it a go and got a different error this time: The expanded size of the tensor (12800) must match the existing size (0) at non-singleton dimension 0. Target sizes: [12800]. Tensor sizes: [0] Error.txt
@GregoryBetsey did this error occur with the data source you sent to me?
@GregoryBetsey I haven't been able to replicate the issue but I have identified what may have caused it and tried to fix in 0.5.3.
@GregoryBetsey did this error occur with the data source you sent to me?
Yes, I am using the same files I sent you earlier. I will try your latest release and test the results.
Update: I tried the latest release. I got a different error: data\datasets\JamesEarlJones\wavs\1470_2520.wav wav file is empty Text.txt
@GregoryBetsey very interesting, seems like it can't open that file. Could you find that file and make sure it is playable. If it is could you email it to me?
@GregoryBetsey very interesting, seems like it can't open that file. Could you find that file and make sure it is playable. If it is could you email it to me?
I am using the same audio and text transcript that I sent to you using google drive. The audio file is fine. If you need the link again, I can send it to you.
@GregoryBetsey I've produced the dataset and that clip (1470_2520.wav) is playable and can be transcribed. Just to double-check did you try playing the original audio or the 1470_2520.wav clip?
@GregoryBetsey, I've been able to reproduce this error once. It seems to be that FFmpeg (very rarely) corrupts the audio when trimming. Handling of this will be added in an upcoming release
@GregoryBetsey, I've been able to reproduce this error once. It seems to be that FFmpeg (very rarely) corrupts the audio when trimming. Handling of this will be added in an upcoming release
Thanks for the update. I haven't got past the error.
Hi @GregoryBetsey, thank you for your patience. This should be handled in 0.6. Please let me know how you get on
your
Thanks for working on this. I don't know if this is progress, but it actually started generating segments this time except I got a message saying the audio can't be transcribed. [Again, I using the files from my Google Drive].
@GregoryBetsey That's interesting. It looks like there's an issue with FFmpeg cutting the clips. Could you do the following:
The issue must be to do with FFmpeg, so if those files exist then it is not working correctly
Okay, the app is generating the audio files and I installed ffmpeg to C:\ and is working. I deleted FFmpeg in the app folder but I still get errors.
Hi @GregoryBetsey, could you try running the following command:
ffmpeg -ss 00:00:00.000 -t 10.0 -i filename test.wav
where filename is the name of your original audio file (i.e. audio.mp3).
Please make sure you run this command from the same directory as where the original audio file is.
Then check if it produces a new audio file called test.wav
that is playable and 10 seconds long
It isn't working. Unless I set it up wrong
Hi @GregoryBetsey, please replace the word filename
with your audio name rather than test.wav
. It should look like this:
ffmpeg -ss 00:00:00.000 -t 10.0 -i Matthew.mp3 test.wav
Okay, it worked now.
.
Does that audio playback OK? If so could you run this and check that Matthew-converted.wav
is playable:
ffmpeg -i Matthew.mp3 -b:a 32k -ac 1 -map a -ar 22050 Matthew-converted.wav
Does that audio playback OK? If so could you run this and check that
Matthew-converted.wav
is playable:
ffmpeg -i Matthew.mp3 -b:a 32k -ac 1 -map a -ar 22050 Matthew-converted.wav
Yes, it works.
Ok, so now could you run ffmpeg -ss 00:00:00.000 -t 10.0 -i Matthew-converted.wav test.wav
and check the result
Yes, it works.
Hmm, this is interesting. You see the app just runs the conversion command and then the trim command which is exactly what you've done here. Have you tried running the app again since reinstalling ffmpeg?
Yes, I
Okay, the app is generating the audio files and I installed ffmpeg to C:\ and is working. I deleted FFmpeg in the app folder but I still get errors.
Yes I I did that here. The app generates clips. It says it cannot transcribe at the end and then it deletes all the generated waves. Error Log.txt
@GregoryBetsey whilst it is running could you copy one of the generated wav files. It should be saved to data\datasets\ dataset_name\wavs
where dataset_name is the name of the dataset. Then could you check if that is playable?
@GregoryBetsey whilst it is running could you copy one of the generated wav files. It should be saved to
data\datasets\ dataset_name\wavs
where dataset_name is the name of the dataset. Then could you check if that is playable?
The wavs file can be opened, but since the generated length is 00:00:00 there isn't any audio sound. [see attachment] Example.zip
Ok so FFmpeg isn't working when cutting the audio as all of these clips should be at least 1 second long. I don't understand why the command would outside of the app but not in it as both should be using the same FFmpeg and command. I will try and resolve this week
also experiencing this issue exactly as described in #27 (nothing but "Could not transcribe data\datasets..." messages and zero-length wave files despite having a tested working ffmpeg install) when attempting to build either my own or the provided demo datasets.
Something I noticed that does seem off is that regardless of whether my source audio file is an mp3 or a wave, the application logfile always says that it is converting from an mp3. eg:
Coverting data\datasets\TestVoice\audio.mp3... Loading script from data\datasets\TestVoice\text.txt... Searching text for matching fragments... Changing sample rate... Fetching segments... Matching segments... Generating segments... Could not transcribe data\datasets\TestVoice\wavs\1650_2730.wav Could not transcribe data\datasets\TestVoice\wavs\5850_7680.wav Could not transcribe data\datasets\TestVoice\wavs\7680_9330.wav Could not transcribe data\datasets\TestVoice\wavs\9450_10530.wav Could not transcribe data\datasets\TestVoice\wavs\10560_12720.wav
The audio file being converted above was a wave file named "this_is_a_wave_file.wav". Having said that, the "audio-converted.wav" and "audio-converted-16000.wav" files generated in the dataset's working directly isare playable and seemingly in the right format according to VLC Player:
Stream 0 ("audio-converted.wav") Codec: PCM S16 LE (s16l) Type: Audio Channels: Mono Sample rate: 22050 Hz Bits per sample: 16
Stream 0 ("audio-converted-16000.wav") Codec: PCM S16 LE (s16l) Type: Audio Channels: Mono Sample rate: 16000 Hz Bits per sample: 16
It's just the separated out segments that are inoperable (nothing but 78 bytes of metadata in each one.)
@RayDAnt3D thank you for this info. This seems to be an issue for several people so it is my number one priority. I'm hoping to have it fixed by Sunday 🤞
@GregoryBetsey @RayDAnt3D I'm struggling to figure out what's causing this issue & I can't get it to replicate locally. The issue must be to do with either the FFmpeg install or one of the commands.
To test this I've produced the following: https://drive.google.com/drive/folders/17zT6fg7V_gu_kMVZs2ERPmfGyFRuDhWg?usp=sharing
In there you'll find a test audio file and a script. Could you try downloading both & running the script. Then check that it produces an audio file called test-final.wav
that is playable & 3 seconds long.
Thank you for your patience
@GregoryBetsey @RayDAnt3D I'm struggling to figure out what's causing this issue & I can't get it to replicate locally. The issue must be to do with either the FFmpeg install or one of the commands.
To test this I've produced the following: https://drive.google.com/drive/folders/17zT6fg7V_gu_kMVZs2ERPmfGyFRuDhWg?usp=sharing
In there you'll find a test audio file and a script. Could you try downloading both & running the script. Then check that it produces an audio file called
test-final.wav
that is playable & 3 seconds long.Thank you for your patience
Hey I downloaded it and ran the script. I can confirm it produced a 3 second playable clip called "test-clip.wav"
Also downloaded/ran the test script and audio clip and got the following tested working audio files generated:
test-clean.wav test-clean-16000.wav test-clip.wav
No "test-final.wav" though.
@RayDAnt3D @arthur465 Sorry I meant test-clip.wav
. So it sounds like the FFmpeg commands are working for all of you. I'm going to try and create a release today which has improved error logging on the clip building process so we can find out where it is failing in the app
@GregoryBetsey @arthur465 @RayDAnt3D I've created a new release here: https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.6.2. It won't fix the issue but it might help tell us what the error is. It will now check the output of the FFmpeg commands and will also show it running in the console. Could you give it a go and let me know what happens
@GregoryBetsey @arthur465 @RayDAnt3D I've created a new release here: https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.6.2. It won't fix the issue but it might help tell us what the error is. It will now check the output of the FFmpeg commands and will also show it running in the console. Could you give it a go and let me know what happens
Ok here's the error I get
INFO:voice:Progress - 391/416 INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 11:44:13] "GET /socket.io/?EIO=4&transport=polling&t=Na02_Cm&sid=7KcT1PoIXIKbnCvUAAAA HTTP/1.1" 200 - ffmpeg version 2021-04-18-git-d43b26b30d-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers built with gcc 10.2.0 (Rev6, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint libavutil 56. 73.100 / 56. 73.100 libavcodec 58.136.101 / 58.136.101 libavformat 58. 78.100 / 58. 78.100 libavdevice 58. 14.100 / 58. 14.100 libavfilter 7.111.100 / 7.111.100 libswscale 5. 10.100 / 5. 10.100 libswresample 3. 10.100 / 3. 10.100 libpostproc 55. 10.100 / 55. 10.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'data\datasets\Arthur 2\audio-converted.wav': Metadata: encoder : Lavf58.78.100 Duration: 00:17:31.99, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data\datasets\Arthur 2\wavs\994140_995250.wav': Metadata: ISFT : Lavf58.78.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Metadata: encoder : Lavc58.136.101 pcm_s16le size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] 7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav"}] INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 11:44:13] "GET /socket.io/?EIO=4&transport=polling&t=Na02_Cs&sid=7KcT1PoIXIKbnCvUAAAA HTTP/1.1" 200 - INFO:engineio.server:7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav"}] INFO:voice:Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav emitting event "progress" to all [/voice] INFO:socketio.server:emitting event "progress" to all [/voice] 7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 392","total":"416"}] INFO:engineio.server:7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 392","total":"416"}]
Here's what I get for the first sample cutting attempt (and every other thereafter) using the Ayaode dataset assets:
INFO:voice:Generating segments... ffmpeg version 2021-04-18-git-d43b26b30d-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers built with gcc 10.2.0 (Rev6, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint libavutil 56. 73.100 / 56. 73.100 libavcodec 58.136.101 / 58.136.101 libavformat 58. 78.100 / 58. 78.100 libavdevice 58. 14.100 / 58. 14.100 libavfilter 7.111.100 / 7.111.100 libswscale 5. 10.100 / 5. 10.100 libswresample 3. 10.100 / 3. 10.100 libpostproc 55. 10.100 / 55. 10.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'data\datasets\Ayoade\audio-converted.wav': Metadata: artist : Richard Ayoade comment : At last, the definitive audiobook about perhaps the best cabin crew dramedy ever filmed: View from the Top starring Gwyneth Paltrow. In Ayoade on Top, Richard Ayoade, perhaps one of the most 'insubstantial' people of our age, takes us on a journey from Pe copyright : ©2019 Richard Ayoade (P)2019 Audible, Ltd date : 2019 genre : Audiobook title : 1 - Ayoade on Top album : Ayoade on Top track : 1/1 encoder : Lavf58.78.100 Duration: 04:39:25.09, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data\datasets\Ayoade\wavs\60_1680.wav': Metadata: IART : Richard Ayoade ICMT : At last, the definitive audiobook about perhaps the best cabin crew dramedy ever filmed: View from the Top starring Gwyneth Paltrow. In Ayoade on Top, Richard Ayoade, perhaps one of the most 'insubstantial' people of our age, takes us on a journey from Pe ICOP : ©2019 Richard Ayoade (P)2019 Audible, Ltd ICRD : 2019 IGNR : Audiobook INAM : 1 - Ayoade on Top IPRD : Ayoade on Top IPRT : 1/1 ISFT : Lavf58.78.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Metadata: encoder : Lavc58.136.101 pcm_s16le size= 1kB time=00:00:00.00 bitrate=N/A speed= 0x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) Using cache found in C:\Users\gbase/.cache\torch\hub\snakers4_silero-models_master NpmL9qWiQt7YXzLnAAAA: Sending packet PING data None INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:19:47] "GET /socket.io/?EIO=4&transport=polling&t=Na0B7hC&sid=NpmL9qWiQt7YXzLnAAAA HTTP/1.1" 200 - NpmL9qWiQt7YXzLnAAAA: Received packet PONG data INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:19:47] "POST /socket.io/?EIO=4&transport=polling&t=Na0B84R&sid=NpmL9qWiQt7YXzLnAAAA HTTP/1.1" 200 - emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav"}] INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav"}] INFO:voice:Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav emitting event "progress" to all [/voice] INFO:socketio.server:emitting event "progress" to all [/voice] NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"5021"}] INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"5021"}] INFO:voice:Progress - 1/5021
For what it's worth, here also is my app log at first startup:
[12568] WARNING: file already exists but should not: C:\Users\gbase\AppData\Local\Temp_MEI125682\torch_C.cp38-win_amd64.pyd Server initialized for threading. Server initialized for threading. torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do
torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to [nltk_data] C:\Users\gbase\AppData\Local\Temp_MEI125682\nltk_data [nltk_data] ... [nltk_data] Package wordnet is already up-to-date! WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.
- Serving Flask app "main" (lazy loading)
- Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- Debug mode: off INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:17:35] "GET / HTTP/1.1" 200 -
Did some source code snooping and noticed that running:
start_timestamp = datetime.fromtimestamp(start / 1000).strftime("%H:%M:%S.%f")
As appears in dataset\audio_processing.py
inside the cut_audio()
routine with start=60 (as the first Ayoade clip would be) on the Python commandline like so:
from subprocess import call from pathlib import Path from datetime import datetime from pydub import AudioSegment import os datetime.fromtimestamp(60 / 1000).strftime("%H:%M:%S.%f")
results in the following output:
'19:00:00.060000'
Pretty sure that additional '19:00:00.000000' shouldn't be there. The root of the problem may just be a date/time localization mismatch.
@RayDAnt3D great find. What time localization do you use?
I tried the latest version (0.63) just to see if anything was different-----the initial files it creates from my sample mp3------audio.mp3, audio-converted.wav, and audio-converted-16000.wav are all fine, same as before. The many individual clip-wavs inside the folder, are all "empty" files, with length 00:00:00, size 78 bytes. I believe that's the same as before (I stopped it before it auto-deleted them this time, so I could check them)
Error when trying to process are similar to the post above:
Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'data\datasets\Kate\audio-converted.wav': Metadata: encoder : Lavf58.76.100 Duration: 04:18:04.84, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data\datasets\Kate\wavs\1436520_1438290.wav': Metadata: ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav"}] INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 12:19:54] "GET /socket.io/?EIO=4&transport=polling&t=Na4vHg3&sid=MRg0ipT8vkRYLkMJAAAA HTTP/1.1" 200 - INFO:engineio.server:MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav"}] INFO:voice:Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav emitting event "progress" to all [/voice] INFO:socketio.server:emitting event "progress" to all [/voice] MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 356","total":"5134"}] INFO:engineio.server:MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 356","total":"5134"}] INFO:voice:Progress - 356/5134
@BenAAndrew US EST (technically currently EDT.)
Hello
I am running the Voice-Cloning-App.exe on Windows 10. I have a GeForce RTX 2060 Graphics Card with the GeForce Game Ready Driver Version 461.92.
When I attempt build the data set, the windows console stops after the following:
[12644] WARNING: file already exists but should not: C:\Users\GREGOR~1\AppData\Local\Temp_MEI126442\torch_C.cp38-win_amd64.pyd Server initialized for threading. Server initialized for threading. pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do
torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L [nltk_data] ocal\Temp_MEI126442\nltk_data... [nltk_data] Package wordnet is already up-to-date! WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. Exception in thread Thread-13: Traceback (most recent call last): File "application\utils.py", line 47, in background_task maxseqlength = max(max([len() for _ in batch]), 12800) File "application\utils.py", line 32, in create_dataset if wav.size(0) > 1: File "dataset\forced_alignment\align.py", line 123, in align File "dataset\transcribe.py", line 34, in stt File "dataset\transcribe.py", line 16, in transcribe File "torch\hub.py", line 370, in load File "torch\hub.py", line 399, in _load_local File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 24, in silero_stt model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit, File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 135, in init_jit_model model = torch.jit.load(model_path, map_location=device) File "torch\jit_serialization.py", line 161, in load RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directoryDuring handling of the above exception, another exception occurred:
Traceback (most recent call last): File "threading.py", line 932, in _bootstrap_inner File "threading.py", line 870, in run File "application\utils.py", line 50, in backgroundtask inputs[i, :len(wav)].copy(wav) NameError: name 'traceback' is not defined qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "GET /socket.io/?EIO=4&transport=polling&t=NXjKvy4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "POST /socket.io/?EIO=4&transport=polling&t=NXjKyzw&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "GET /socket.io/?EIO=4&transport=polling&t=NXjKyzw.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "POST /socket.io/?EIO=4&transport=polling&t=NXjL358&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "GET /socket.io/?EIO=4&transport=polling&t=NXjL358.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "POST /socket.io/?EIO=4&transport=polling&t=NXjL9CA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjL9CB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "POST /socket.io/?EIO=4&transport=polling&t=NXjLFJ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "GET /socket.io/?EIO=4&transport=polling&t=NXjLFJ8.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "POST /socket.io/?EIO=4&transport=polling&t=NXjLLQ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "GET /socket.io/?EIO=4&transport=polling&t=NXjLLQ9&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "POST /socket.io/?EIO=4&transport=polling&t=NXjLRWz&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjLRW-&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "POST /socket.io/?EIO=4&transport=polling&t=NXjLXe3&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "GET /socket.io/?EIO=4&transport=polling&t=NXjLXe4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "POST /socket.io/?EIO=4&transport=polling&t=NXjLdkv&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "GET /socket.io/?EIO=4&transport=polling&t=NXjLdkv.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "POST /socket.io/?EIO=4&transport=polling&t=NXjLjrp&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "GET /socket.io/?EIO=4&transport=polling&t=NXjLjrq&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "POST /socket.io/?EIO=4&transport=polling&t=NXjLpyh&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjLpyh.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjLw3g&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:03:07] "qINJoZN0iSsAW66FAAAA: Received packet CLOSE data GET /socket.io/?EIO=4&transport=polling&t=NXjLw3g.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1qINJoZN0iSsAW66FAAAA: Client is gone, closing socket Error.txt