Stuck UI with Long Files

JuergenFleiss / aTrain

A GUI tool for offline transcription of speech recordings, including speaker diarization, utilizing state-of-the-art machine learning models.

Other

360 stars 23 forks source link

Stuck UI with Long Files #41

Open architeck opened 1 week ago

architeck commented 1 week ago

I'm transcribing 2+ hour long wav files and notice that the transcription is completed rather quickly on a Nvidia 3070 using large-v3-turbo model. I can see the completed transcript in the output folder, but the UI remains stuck on "transcribing".

The image below is shows less time, but luckily I checked the folders after my long file run for 5+ hour without and progress.

Edit: I'm actually noticing the stuck UI thing on smaller files as well now.

roperna commented 1 week ago

I've been facing the same issue in the last three days. I'm transcribing 1/1.5 hour long wav files. Compared to last week, I cannot select the model anymore (the only option now is 'large-v3-turbo'), and the UI remains stuck on "transcribing". I checked the folders after my files running for 10+ hours, but there are no progress.

JuergenFleiss commented 1 week ago

Ok, we released the new version 1.2 three days ago, so this is probably a bug that we did not catch in testing and I cannot reproduce it on my end right now. We will have to take a look on the new way we track the progress in the new steps.

Large-v3 turbo is freshly released from OpenAI and offers great balance between speed and accuracy. As for the other models, you now have a new menu point on the left: Model Manager. There you can download all the available whisper models, also large-v2 or large-v3.

Could you try other models and let me know if the UI is also stuck with those?

SchwarzDeveloping commented 1 week ago

Same here. It randomly gets stuck with files over 23 minutes long, while files under 20 minutes pose no issues. The file extension doesn't seem to matter.

JuergenFleiss commented 1 week ago

Same here. It randomly gets stuck with files over 23 minutes long, while files under 20 minutes pose no issues. The file extension doesn't seem to matter.

Ok, I will try to reproduce it.

In the meantime, you can revert back to v1.1 using the installer here: https://business-analytics.uni-graz.at/de/forschung/atrain/download/

stevevaius2015 commented 4 days ago

Hello, Did you release v1.2 ? Where is the release to download for developers?

JuergenFleiss commented 4 days ago

Hello, Did you release v1.2 ? Where is the release to download for developers?

Hi, we released the new 1.2 installer on microsoft store last week; I also did a release on github now.

aereimer commented 3 days ago

Hi @JuergenFleiss, thanks for the amazing software. Just want to report that I got similar issues using both atrain 1.2 and atrain core 1.2 (using this command line --model "large-v3" --language "en" --speaker_detection --num_speakers "auto-detect" --device "GPU" --compute_type "float16") in my case, changing the model (turbo, large-v3) did not matter It seems 1.1 is working fine - UPDATE: 1.1 crashed too. im investigating if it is an issue with the mp3 file. It was a 2:13h audio; splitting it worked for some files. some chunks show this error:

Command '['aTrain_core', 'transcribe', 'C:\Users\Data analysis\Downloads\segments\output_003.mp3', '--model', 'large-v3', '--language', 'en', '--device', 'GPU', '--compute_type', 'float16', '--speaker_detection', '--num_speakers', 'auto-detect']' returned non-zero exit status 3221226505.

JuergenFleiss commented 2 days ago

Ok, thats quite strange. Might be related to faster-whisper itself, they also have reports of videos >20min crashing https://github.com/SYSTRAN/faster-whisper/issues/687

Could one of you share the files that crash atrain so that we can investigate (as we are unable to reproduce the problem)? You can reach me under bandas@uni-graz.at