HenestrosaDev / audiotext

A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
Other
155 stars 15 forks source link

Not extracting audio using ffmpeg #11

Closed Anil-de closed 9 months ago

Anil-de commented 9 months ago

Hello, Hi i am using the recently released latest version of audiotext v2. 1.0. Using the whisperx option to transcribe and generating subtiles I got the following error even though the use cpu option was selected when running:

RuntimeError: CUDA error: CUDA driver version is insufficient for CUDA runtime version

But I have fixed this by manually editing the config file. use_cpu=True and use_gpu=Flase. And the issue was resolved.

But Now I am facing with another error:

Traceback (most recent call last): File "controller\main_controller.py", line 179, in _transcribe_using_whisperx File "whisperx\audio.py", line 61, in load_audio out = subprocess.run(cmd, capture_output=True, check=True).stdout File "subprocess.py", line 501, in run File "subprocess.py", line 969, in init File "subprocess.py", line 1438, in _execute_child FileNotFoundError: [WinError 2] The system cannot find the file specified

How to solve the issue?

My device specs: Windows 11 Ryzen 3 processor 12 Gigabytes of Ram

HenestrosaDev commented 9 months ago

Hi.

It seems that the issue is related to ffmpeg, as you mentioned in the title. Can you open the command line and run ffmpeg -version? It should print something like this:

ffmpeg version 5.1.2-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil      57. 28.100 / 57. 28.100
libavcodec     59. 37.100 / 59. 37.100
libavformat    59. 27.100 / 59. 27.100
libavdevice    59.  7.100 / 59.  7.100
libavfilter     8. 44.100 /  8. 44.100
libswscale      6.  7.100 /  6.  7.100
libswresample   4.  7.100 /  4.  7.100

If this results in an error, then it's because you don't have ffmpeg installed on your system. To do this, install ffmpeg as described in the last bullet point in the Getting Started > Notes section.

Let me know if this solves the issue.

Anil-de commented 9 months ago

The issue has been resolved. The ffmpeg was not installed on my system. So I have manually installed it by creating a directory named ffmpeg in the Cdrive and placed the executable file there. And setting the environmental variable to the ffmpeg directory was successful installed. ffmpeg -version in the command prompt gave me the below output:

ScreenShot_20231216164234

Everything seems to be fine now no erros. But there seems to another issue. The Transcribing process is taking forever. I just tried with small video file around 80mb size. But it doesn't give me any output. I waited for 2 hours. No response, nothing. The process is taking forever. Why is that? Should I use different model size? And settings? Please let me know. Thanks!

HenestrosaDev commented 9 months ago

@Anil-de I'm glad the ffmpeg problem is gone. That's why I suggested runinng ffmpeg -version before doing anything else, because that would have been enough to know that the error was that ffmpeg was not installed.

About the transcription process, can you provide the advanced options that you have chosen? I assume you're using the WhisperX transcription method. Also, can you provide the specific model of your CPU? Ryzen 3 is not enough information.

Anil-de commented 9 months ago

Sure. Here are my system specifications:

[ Device Specifications ]

Device name: LAPTOP-ABE9UERF Processor: AMD Ryzen 3 3200U with Radeon Vega Mobile Gfx 2.60 GHz Installed RAM: 12.0 GB (9.88 GB usable) System type: 64-bit operating system, x64-based processor Pen and touch: No pen or touch input is available for this display storage SSD:C: 250GB, HDD:D: 1TB

[ Windows specifications ]

Edition Windows: 11 Pro Version 22H2 Installed on: ‎05-‎11-‎2023 OS build: 22621.2861 Experience: Windows Feature Experience Pack 1000.22681.1000.0

[ GPU specifications ]

GPU Name: AMD Radeon(TM) Vega 3 Graphics Dedicated Memory: 2GB Shared Memory: 5GB

And about the settings of audiotext application. These are the config.ini file settings:

[whisperx] model_size = large-v2 batch_size = 8 compute_type = int8 use_cpu = True can_use_gpu = False

[google_api] api_key =

[subtitles] highlight_words = True max_line_width = 2 max_line_count = 42

Let me know what causing the transcription to be forever processing?.. Thanks!

HenestrosaDev commented 9 months ago

@Anil-de Try using a smaller batch size, such as 4, and a smaller model size. Try first with medium and if you see that it still takes too much time, try small. This is covered in the Troubleshooting section.