BenAAndrew / Voice-Cloning-App

A Python/Pytorch app for easily synthesising human voices
BSD 3-Clause "New" or "Revised" License
1.39k stars 232 forks source link

Transcription error: wav file is empty #11

Closed GregoryBetsey closed 3 years ago

GregoryBetsey commented 3 years ago

Hello

I am running the Voice-Cloning-App.exe on Windows 10. I have a GeForce RTX 2060 Graphics Card with the GeForce Game Ready Driver Version 461.92.

When I attempt build the data set, the windows console stops after the following:

[12644] WARNING: file already exists but should not: C:\Users\GREGOR~1\AppData\Local\Temp_MEI126442\torch_C.cp38-win_amd64.pyd Server initialized for threading. Server initialized for threading. pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L [nltk_data] ocal\Temp_MEI126442\nltk_data... [nltk_data] Package wordnet is already up-to-date! WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "threading.py", line 932, in _bootstrap_inner File "threading.py", line 870, in run File "application\utils.py", line 50, in backgroundtask inputs[i, :len(wav)].copy(wav) NameError: name 'traceback' is not defined qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "GET /socket.io/?EIO=4&transport=polling&t=NXjKvy4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "POST /socket.io/?EIO=4&transport=polling&t=NXjKyzw&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "GET /socket.io/?EIO=4&transport=polling&t=NXjKyzw.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "POST /socket.io/?EIO=4&transport=polling&t=NXjL358&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "GET /socket.io/?EIO=4&transport=polling&t=NXjL358.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "POST /socket.io/?EIO=4&transport=polling&t=NXjL9CA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjL9CB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "POST /socket.io/?EIO=4&transport=polling&t=NXjLFJ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "GET /socket.io/?EIO=4&transport=polling&t=NXjLFJ8.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "POST /socket.io/?EIO=4&transport=polling&t=NXjLLQ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "GET /socket.io/?EIO=4&transport=polling&t=NXjLLQ9&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "POST /socket.io/?EIO=4&transport=polling&t=NXjLRWz&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjLRW-&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "POST /socket.io/?EIO=4&transport=polling&t=NXjLXe3&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "GET /socket.io/?EIO=4&transport=polling&t=NXjLXe4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "POST /socket.io/?EIO=4&transport=polling&t=NXjLdkv&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "GET /socket.io/?EIO=4&transport=polling&t=NXjLdkv.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "POST /socket.io/?EIO=4&transport=polling&t=NXjLjrp&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "GET /socket.io/?EIO=4&transport=polling&t=NXjLjrq&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "POST /socket.io/?EIO=4&transport=polling&t=NXjLpyh&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjLpyh.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjLw3g&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:03:07] "qINJoZN0iSsAW66FAAAA: Received packet CLOSE data GET /socket.io/?EIO=4&transport=polling&t=NXjLw3g.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1qINJoZN0iSsAW66FAAAA: Client is gone, closing socket Error.txt

BenAAndrew commented 3 years ago

@GregoryBetsey It looks like something went wrong when trying to transcribe your audio to build the dataset. Could you firstly check that you used the latest executable Version 0.3 as the second error should have been fixed in that release.

If you did use that or the error still occurs could you upload your audio/text to google drive or email it to me at benandrew89@gmail.com so I can run some analysis

GregoryBetsey commented 3 years ago

@BenAAndrew Thanks for responding. I will send you a download link to your email address. I did not use the "automatic" audiobook method shown in your Youtube video, rather I transcribed the text manually.

GregoryBetsey commented 3 years ago

Update: I tried the latest release and got this error: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory.

Server initialized for threading. Server initialized for threading. pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work ['C:\Users\GREGOR~1\AppData\Local\Temp\_MEI104602\base_library.zip', 'C:\Users\GREGOR~1\AppData\Local\Temp\_MEI104602', 'synthesis/waveglow/', 'C:\Users\Gregory Betsey'] torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L [nltk_data] ocal\Temp_MEI104602\nltk_data... [nltk_data] Package wordnet is already up-to-date! INSTALLING FFMPEG VERIFYING FFMPEG INSTALL WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

BenAAndrew commented 3 years ago

@GregoryBetsey if you look at the folder which contains your .exe, is there a file called latest_silero_models.yml ?

GregoryBetsey commented 3 years ago

Yes, it does. I ran it through edge this time and got farther than before but got a new error this time: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous.

Error.txt

BenAAndrew commented 3 years ago

I'll investigate this and get back to you.

BenAAndrew commented 3 years ago

@GregoryBetsey It the latest build 0.4.1 I've added some extra validation to the transcription process which may fix the bug. Could you give it a go?

GregoryBetsey commented 3 years ago

@GregoryBetsey It the latest build 0.4.1 I've added some extra validation to the transcription process which may fix the bug. Could you give it a go?

Thanks. I tried the latest built today any got stuck on the "generating segments..." section. I will attach the log file. Error 4.4.2021.txt

P.S. I am using the same files I sent to you via google drive.

BenAAndrew commented 3 years ago

@GregoryBetsey Thank you for the error log. The issue seems to be with the torchaudio library not being able to change the audio sample rate. I will investigate now

BenAAndrew commented 3 years ago

@GregoryBetsey I've removed the code throwing the bug and replaced it with a different library. If you get a minute could you try release 0.5.1?

GregoryBetsey commented 3 years ago

I gave it a go and got a different error this time: The expanded size of the tensor (12800) must match the existing size (0) at non-singleton dimension 0. Target sizes: [12800]. Tensor sizes: [0] Error.txt Image

BenAAndrew commented 3 years ago

@GregoryBetsey did this error occur with the data source you sent to me?

BenAAndrew commented 3 years ago

@GregoryBetsey I haven't been able to replicate the issue but I have identified what may have caused it and tried to fix in 0.5.3.

GregoryBetsey commented 3 years ago

@GregoryBetsey did this error occur with the data source you sent to me?

Yes, I am using the same files I sent you earlier. I will try your latest release and test the results.

GregoryBetsey commented 3 years ago

Update: I tried the latest release. I got a different error: data\datasets\JamesEarlJones\wavs\1470_2520.wav wav file is empty Image Text.txt

BenAAndrew commented 3 years ago

@GregoryBetsey very interesting, seems like it can't open that file. Could you find that file and make sure it is playable. If it is could you email it to me?

GregoryBetsey commented 3 years ago

@GregoryBetsey very interesting, seems like it can't open that file. Could you find that file and make sure it is playable. If it is could you email it to me?

I am using the same audio and text transcript that I sent to you using google drive. The audio file is fine. If you need the link again, I can send it to you.

BenAAndrew commented 3 years ago

@GregoryBetsey I've produced the dataset and that clip (1470_2520.wav) is playable and can be transcribed. Just to double-check did you try playing the original audio or the 1470_2520.wav clip?

BenAAndrew commented 3 years ago

@GregoryBetsey, I've been able to reproduce this error once. It seems to be that FFmpeg (very rarely) corrupts the audio when trimming. Handling of this will be added in an upcoming release

GregoryBetsey commented 3 years ago

@GregoryBetsey, I've been able to reproduce this error once. It seems to be that FFmpeg (very rarely) corrupts the audio when trimming. Handling of this will be added in an upcoming release

Thanks for the update. I haven't got past the error.

BenAAndrew commented 3 years ago

Hi @GregoryBetsey, thank you for your patience. This should be handled in 0.6. Please let me know how you get on

GregoryBetsey commented 3 years ago

your

Thanks for working on this. I don't know if this is progress, but it actually started generating segments this time except I got a message saying the audio can't be transcribed. [Again, I using the files from my Google Drive].

Voice Cloning - Profile 1 - Microsoft​ Edge 4_15_2021 1_15_39 PM Log.txt

BenAAndrew commented 3 years ago

@GregoryBetsey That's interesting. It looks like there's an issue with FFmpeg cutting the clips. Could you do the following:

  1. Check that the audio files listed in the logs exist
  2. Check if there is a folder called 'ffmpeg' in the same directory as the application. If there is, delete it.
  3. Try installing FFmpeg manually. i.e. following https://www.youtube.com/watch?v=hD9bQE4R6eA

The issue must be to do with FFmpeg, so if those files exist then it is not working correctly

GregoryBetsey commented 3 years ago

Okay, the app is generating the audio files and I installed ffmpeg to C:\ and is working. I deleted FFmpeg in the app folder but I still get errors.

01 02 Error.txt

BenAAndrew commented 3 years ago

Hi @GregoryBetsey, could you try running the following command: ffmpeg -ss 00:00:00.000 -t 10.0 -i filename test.wav where filename is the name of your original audio file (i.e. audio.mp3).

Please make sure you run this command from the same directory as where the original audio file is. Then check if it produces a new audio file called test.wav that is playable and 10 seconds long

GregoryBetsey commented 3 years ago

It isn't working. Unless I set it up wrong

Test

BenAAndrew commented 3 years ago

Hi @GregoryBetsey, please replace the word filename with your audio name rather than test.wav. It should look like this:

ffmpeg -ss 00:00:00.000 -t 10.0 -i Matthew.mp3 test.wav

GregoryBetsey commented 3 years ago

Okay, it worked now.

Untitled .

BenAAndrew commented 3 years ago

Does that audio playback OK? If so could you run this and check that Matthew-converted.wav is playable:

ffmpeg -i Matthew.mp3 -b:a 32k -ac 1 -map a -ar 22050 Matthew-converted.wav

GregoryBetsey commented 3 years ago

Does that audio playback OK? If so could you run this and check that Matthew-converted.wav is playable:

ffmpeg -i Matthew.mp3 -b:a 32k -ac 1 -map a -ar 22050 Matthew-converted.wav

Yes, it works. Untitled

BenAAndrew commented 3 years ago

Ok, so now could you run ffmpeg -ss 00:00:00.000 -t 10.0 -i Matthew-converted.wav test.wav and check the result

GregoryBetsey commented 3 years ago

Yes, it works. Untitled

BenAAndrew commented 3 years ago

Hmm, this is interesting. You see the app just runs the conversion command and then the trim command which is exactly what you've done here. Have you tried running the app again since reinstalling ffmpeg?

GregoryBetsey commented 3 years ago

Yes, I

Okay, the app is generating the audio files and I installed ffmpeg to C:\ and is working. I deleted FFmpeg in the app folder but I still get errors.

01 02

Yes I I did that here. The app generates clips. It says it cannot transcribe at the end and then it deletes all the generated waves. Error Log.txt

BenAAndrew commented 3 years ago

@GregoryBetsey whilst it is running could you copy one of the generated wav files. It should be saved to data\datasets\ dataset_name\wavs where dataset_name is the name of the dataset. Then could you check if that is playable?

GregoryBetsey commented 3 years ago

@GregoryBetsey whilst it is running could you copy one of the generated wav files. It should be saved to data\datasets\ dataset_name\wavs where dataset_name is the name of the dataset. Then could you check if that is playable?

The wavs file can be opened, but since the generated length is 00:00:00 there isn't any audio sound. [see attachment] Example.zip

BenAAndrew commented 3 years ago

Ok so FFmpeg isn't working when cutting the audio as all of these clips should be at least 1 second long. I don't understand why the command would outside of the app but not in it as both should be using the same FFmpeg and command. I will try and resolve this week

RayDAnt3D commented 3 years ago

also experiencing this issue exactly as described in #27 (nothing but "Could not transcribe data\datasets..." messages and zero-length wave files despite having a tested working ffmpeg install) when attempting to build either my own or the provided demo datasets.

Something I noticed that does seem off is that regardless of whether my source audio file is an mp3 or a wave, the application logfile always says that it is converting from an mp3. eg:

Coverting data\datasets\TestVoice\audio.mp3... Loading script from data\datasets\TestVoice\text.txt... Searching text for matching fragments... Changing sample rate... Fetching segments... Matching segments... Generating segments... Could not transcribe data\datasets\TestVoice\wavs\1650_2730.wav Could not transcribe data\datasets\TestVoice\wavs\5850_7680.wav Could not transcribe data\datasets\TestVoice\wavs\7680_9330.wav Could not transcribe data\datasets\TestVoice\wavs\9450_10530.wav Could not transcribe data\datasets\TestVoice\wavs\10560_12720.wav

The audio file being converted above was a wave file named "this_is_a_wave_file.wav". Having said that, the "audio-converted.wav" and "audio-converted-16000.wav" files generated in the dataset's working directly isare playable and seemingly in the right format according to VLC Player:

Stream 0 ("audio-converted.wav") Codec: PCM S16 LE (s16l) Type: Audio Channels: Mono Sample rate: 22050 Hz Bits per sample: 16

Stream 0 ("audio-converted-16000.wav") Codec: PCM S16 LE (s16l) Type: Audio Channels: Mono Sample rate: 16000 Hz Bits per sample: 16

It's just the separated out segments that are inoperable (nothing but 78 bytes of metadata in each one.)

BenAAndrew commented 3 years ago

@RayDAnt3D thank you for this info. This seems to be an issue for several people so it is my number one priority. I'm hoping to have it fixed by Sunday 🤞

BenAAndrew commented 3 years ago

@GregoryBetsey @RayDAnt3D I'm struggling to figure out what's causing this issue & I can't get it to replicate locally. The issue must be to do with either the FFmpeg install or one of the commands.

To test this I've produced the following: https://drive.google.com/drive/folders/17zT6fg7V_gu_kMVZs2ERPmfGyFRuDhWg?usp=sharing

In there you'll find a test audio file and a script. Could you try downloading both & running the script. Then check that it produces an audio file called test-final.wav that is playable & 3 seconds long.

Thank you for your patience

arthur465 commented 3 years ago

@GregoryBetsey @RayDAnt3D I'm struggling to figure out what's causing this issue & I can't get it to replicate locally. The issue must be to do with either the FFmpeg install or one of the commands.

To test this I've produced the following: https://drive.google.com/drive/folders/17zT6fg7V_gu_kMVZs2ERPmfGyFRuDhWg?usp=sharing

In there you'll find a test audio file and a script. Could you try downloading both & running the script. Then check that it produces an audio file called test-final.wav that is playable & 3 seconds long.

Thank you for your patience

Hey I downloaded it and ran the script. I can confirm it produced a 3 second playable clip called "test-clip.wav"

RayDAnt3D commented 3 years ago

Also downloaded/ran the test script and audio clip and got the following tested working audio files generated:

test-clean.wav test-clean-16000.wav test-clip.wav

No "test-final.wav" though.

BenAAndrew commented 3 years ago

@RayDAnt3D @arthur465 Sorry I meant test-clip.wav. So it sounds like the FFmpeg commands are working for all of you. I'm going to try and create a release today which has improved error logging on the clip building process so we can find out where it is failing in the app

BenAAndrew commented 3 years ago

@GregoryBetsey @arthur465 @RayDAnt3D I've created a new release here: https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.6.2. It won't fix the issue but it might help tell us what the error is. It will now check the output of the FFmpeg commands and will also show it running in the console. Could you give it a go and let me know what happens

arthur465 commented 3 years ago

@GregoryBetsey @arthur465 @RayDAnt3D I've created a new release here: https://github.com/BenAAndrew/Voice-Cloning-App/releases/tag/v0.6.2. It won't fix the issue but it might help tell us what the error is. It will now check the output of the FFmpeg commands and will also show it running in the console. Could you give it a go and let me know what happens

Ok here's the error I get

INFO:voice:Progress - 391/416 INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 11:44:13] "GET /socket.io/?EIO=4&transport=polling&t=Na02_Cm&sid=7KcT1PoIXIKbnCvUAAAA HTTP/1.1" 200 - ffmpeg version 2021-04-18-git-d43b26b30d-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers built with gcc 10.2.0 (Rev6, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint libavutil 56. 73.100 / 56. 73.100 libavcodec 58.136.101 / 58.136.101 libavformat 58. 78.100 / 58. 78.100 libavdevice 58. 14.100 / 58. 14.100 libavfilter 7.111.100 / 7.111.100 libswscale 5. 10.100 / 5. 10.100 libswresample 3. 10.100 / 3. 10.100 libpostproc 55. 10.100 / 55. 10.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'data\datasets\Arthur 2\audio-converted.wav': Metadata: encoder : Lavf58.78.100 Duration: 00:17:31.99, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data\datasets\Arthur 2\wavs\994140_995250.wav': Metadata: ISFT : Lavf58.78.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Metadata: encoder : Lavc58.136.101 pcm_s16le size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] 7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav"}] INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 11:44:13] "GET /socket.io/?EIO=4&transport=polling&t=Na02_Cs&sid=7KcT1PoIXIKbnCvUAAAA HTTP/1.1" 200 - INFO:engineio.server:7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav"}] INFO:voice:Could not transcribe data\datasets\Arthur 2\wavs\994140_995250.wav emitting event "progress" to all [/voice] INFO:socketio.server:emitting event "progress" to all [/voice] 7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 392","total":"416"}] INFO:engineio.server:7KcT1PoIXIKbnCvUAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 392","total":"416"}]

RayDAnt3D commented 3 years ago

Here's what I get for the first sample cutting attempt (and every other thereafter) using the Ayaode dataset assets:

INFO:voice:Generating segments... ffmpeg version 2021-04-18-git-d43b26b30d-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers built with gcc 10.2.0 (Rev6, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint libavutil 56. 73.100 / 56. 73.100 libavcodec 58.136.101 / 58.136.101 libavformat 58. 78.100 / 58. 78.100 libavdevice 58. 14.100 / 58. 14.100 libavfilter 7.111.100 / 7.111.100 libswscale 5. 10.100 / 5. 10.100 libswresample 3. 10.100 / 3. 10.100 libpostproc 55. 10.100 / 55. 10.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'data\datasets\Ayoade\audio-converted.wav': Metadata: artist : Richard Ayoade comment : At last, the definitive audiobook about perhaps the best cabin crew dramedy ever filmed: View from the Top starring Gwyneth Paltrow. In Ayoade on Top, Richard Ayoade, perhaps one of the most 'insubstantial' people of our age, takes us on a journey from Pe copyright : ©2019 Richard Ayoade (P)2019 Audible, Ltd date : 2019 genre : Audiobook title : 1 - Ayoade on Top album : Ayoade on Top track : 1/1 encoder : Lavf58.78.100 Duration: 04:39:25.09, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data\datasets\Ayoade\wavs\60_1680.wav': Metadata: IART : Richard Ayoade ICMT : At last, the definitive audiobook about perhaps the best cabin crew dramedy ever filmed: View from the Top starring Gwyneth Paltrow. In Ayoade on Top, Richard Ayoade, perhaps one of the most 'insubstantial' people of our age, takes us on a journey from Pe ICOP : ©2019 Richard Ayoade (P)2019 Audible, Ltd ICRD : 2019 IGNR : Audiobook INAM : 1 - Ayoade on Top IPRD : Ayoade on Top IPRT : 1/1 ISFT : Lavf58.78.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Metadata: encoder : Lavc58.136.101 pcm_s16le size= 1kB time=00:00:00.00 bitrate=N/A speed= 0x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) Using cache found in C:\Users\gbase/.cache\torch\hub\snakers4_silero-models_master NpmL9qWiQt7YXzLnAAAA: Sending packet PING data None INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:19:47] "GET /socket.io/?EIO=4&transport=polling&t=Na0B7hC&sid=NpmL9qWiQt7YXzLnAAAA HTTP/1.1" 200 - NpmL9qWiQt7YXzLnAAAA: Received packet PONG data INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:19:47] "POST /socket.io/?EIO=4&transport=polling&t=Na0B84R&sid=NpmL9qWiQt7YXzLnAAAA HTTP/1.1" 200 - emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav"}] INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav"}] INFO:voice:Could not transcribe data\datasets\Ayoade\wavs\60_1680.wav emitting event "progress" to all [/voice] INFO:socketio.server:emitting event "progress" to all [/voice] NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"5021"}] INFO:engineio.server:NpmL9qWiQt7YXzLnAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 1","total":"5021"}] INFO:voice:Progress - 1/5021

For what it's worth, here also is my app log at first startup:

[12568] WARNING: file already exists but should not: C:\Users\gbase\AppData\Local\Temp_MEI125682\torch_C.cp38-win_amd64.pyd Server initialized for threading. Server initialized for threading. torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to [nltk_data] C:\Users\gbase\AppData\Local\Temp_MEI125682\nltk_data [nltk_data] ... [nltk_data] Package wordnet is already up-to-date! WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

  • Serving Flask app "main" (lazy loading)
  • Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Debug mode: off INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) INFO:werkzeug:127.0.0.1 - - [23/Apr/2021 15:17:35] "GET / HTTP/1.1" 200 -
RayDAnt3D commented 3 years ago

Did some source code snooping and noticed that running:

start_timestamp = datetime.fromtimestamp(start / 1000).strftime("%H:%M:%S.%f")

As appears in dataset\audio_processing.py inside the cut_audio() routine with start=60 (as the first Ayoade clip would be) on the Python commandline like so:

from subprocess import call from pathlib import Path from datetime import datetime from pydub import AudioSegment import os datetime.fromtimestamp(60 / 1000).strftime("%H:%M:%S.%f")

results in the following output:

'19:00:00.060000'

Pretty sure that additional '19:00:00.000000' shouldn't be there. The root of the problem may just be a date/time localization mismatch.

BenAAndrew commented 3 years ago

@RayDAnt3D great find. What time localization do you use?

ironpanther commented 3 years ago

I tried the latest version (0.63) just to see if anything was different-----the initial files it creates from my sample mp3------audio.mp3, audio-converted.wav, and audio-converted-16000.wav are all fine, same as before. The many individual clip-wavs inside the folder, are all "empty" files, with length 00:00:00, size 78 bytes. I believe that's the same as before (I stopped it before it auto-deleted them this time, so I could check them)

Error when trying to process are similar to the post above:

Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from 'data\datasets\Kate\audio-converted.wav': Metadata: encoder : Lavf58.76.100 Duration: 04:18:04.84, bitrate: 352 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to 'data\datasets\Kate\wavs\1436520_1438290.wav': Metadata: ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Output file is empty, nothing was encoded (check -ss / -t / -frames parameters if used) emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav"}] INFO:werkzeug:127.0.0.1 - - [24/Apr/2021 12:19:54] "GET /socket.io/?EIO=4&transport=polling&t=Na4vHg3&sid=MRg0ipT8vkRYLkMJAAAA HTTP/1.1" 200 - INFO:engineio.server:MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav"}] INFO:voice:Could not transcribe data\datasets\Kate\wavs\1436520_1438290.wav emitting event "progress" to all [/voice] INFO:socketio.server:emitting event "progress" to all [/voice] MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 356","total":"5134"}] INFO:engineio.server:MRg0ipT8vkRYLkMJAAAA: Sending packet MESSAGE data 2/voice,["progress",{"number":" 356","total":"5134"}] INFO:voice:Progress - 356/5134

RayDAnt3D commented 3 years ago

@BenAAndrew US EST (technically currently EDT.)