anupamkumar / subtitler

A utility that uses Whisper to transcribe videos and various translation APIs to translate the transcribed text and save them as SRT (subtitle) files
GNU General Public License v3.0
8 stars 1 forks source link

Program "completes successfully" without errors but does not split the video or generate any output #2

Closed meekmeekmeek closed 1 day ago

meekmeekmeek commented 4 days ago

Hi,

I've been trying to get subtitler to run under Windows 10, following a pip install (pip install git+https://github.com/anupamkumar/subtitler.git) and while I get no errors, it does not appear to be doing anything except loading the language model then saying it's cleaning up:

Run Configuration: Namespace(video_files=['S:\\Videos\\Music Vids\\Taylor Swift - Anti-Hero (Official Music Video) (1080p_24fps_H264-128kbit_AAC).mp4'], video_dir=None, video_language='english', force_language_autodetect=False, translation_languages=[], translation_service='google', translation_service_api_key=None, mode='gui')

model large-v3 loaded successfully.
Done.
clean up done.

I thought it might be that I don't have ffmpeg installed for the split, but both ffmpeg and ffmpeg-python appear to be installed:

PS C:\Users\salman\AppData\Local\Temp\subtitler> pip install ffmpeg
Defaulting to user installation because normal site-packages is not writeable
Collecting ffmpeg
  Downloading ffmpeg-1.4.tar.gz (5.1 kB)
  Preparing metadata (setup.py) ... done
Installing collected packages: ffmpeg
  DEPRECATION: ffmpeg is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for ffmpeg ... done
Successfully installed ffmpeg-1.4

[notice] A new release of pip available: 22.3.1 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip
PS C:\Users\salman\AppData\Local\Temp\subtitler> pip install ffmpeg-python
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: ffmpeg-python in c:\users\salman\appdata\roaming\python\python310\site-packages (0.2.0)
Requirement already satisfied: future in c:\users\salman\appdata\roaming\python\python310\site-packages (from ffmpeg-python) (1.0.0)

[notice] A new release of pip available: 22.3.1 -> 24.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip

Any idea what's up? This happens with every mp4 file I've tried, whether I enable translation or not. It's also very fast, around 17s even on 10+minute videos. Basically it prints out the configuration, spends a few seconds loading large-v3, then immediately says its cleaned up and Done, and shows the popup for successful completion.

Thanks for doing this btw!

meekmeekmeek commented 4 days ago

I've tried with the cli too with the same results, eg:

PS F:\Downloads> subtitler cli --video_file "f:\Downloads\j-test.mp4" --video_language=japanese --translation_languages english
Run Configuration: Namespace(mode='cli', video_files=['f:\\Downloads\\j-test.mp4'], video_dir=None, video_language='japanese', force_language_autodetect=False, translation_languages=['english'], translation_service='google', translation_service_api_key=None)

loading whisper's 'large-v3' model
Done.
clean up done.
PS F:\Downloads>
meekmeekmeek commented 4 days ago

Ahah, it was FFMPEG - turns out under windows I had to install it following the instructions at https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/ rather than through Pip - after that the whispercli works as expected (on audio files), and the Subtitler gui can split MP4s into audio files too.

However, it seems to be running completely on CPU, which is very slow - my CPU is now pegged at 100%, while my GPU (an RTX 4090) is at 0%.

If I can't figure out what's going on there, I'll log a separate bug for that:

This is where it is 9 minutes into processing that Taylor Swift music video:


Run Configuration: Namespace(video_files=['S:\\Videos\\Music Vids\\Taylor Swift - Anti-Hero (Official Music Video) (1080p_24fps_H264-128kbit_AAC).mp4'], video_dir=None, video_language='english', force_language_autodetect=False, translation_languages=[], translation_service='google', translation_service_api_key=None, mode='gui')

generated wav file for S:\Videos\Music Vids\Taylor Swift - Anti-Hero (Official Music Video) (1080p_24fps_H264-128kbit_AAC).mp4 in C:\Users\salman\AppData\Local\Temp\subtitler
loading whisper's 'large-v3' model
Done.
Transcribing video: S:\Videos\Music Vids\Taylor Swift - Anti-Hero (Official Music Video) (1080p_24fps_H264-128kbit_AAC).mp4 in english
C:\Users\salman\AppData\Roaming\Python\Python310\site-packages\whisper\transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
meekmeekmeek commented 4 days ago

OK, got it working on the GPU too - the missing piece is Pytorch needs to be installed with CUDA support - instructions were at https://pytorch.org/get-started/locally/#with-cuda-1

I think you can resolve the issue now, it was all missing setup on my part, although it would be good to mention in the setup steps the need for CUDO enabled Pytorch, and the extra steps to install FFMPEG under Windows.

anupamkumar commented 1 day ago

Thanks a lot @meekmeekmeek ! I'll update the readme file with info about FFMPEG. Thanks for finding, reporting and figuring this out