SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.24k stars 937 forks source link

transcribe can't find files outside current script working directory #805

Open NeoFahrenheit opened 4 months ago

NeoFahrenheit commented 4 months ago

Hi, I'm on a mac and I am trying to transcibe a audio file, extracted with yt_dlp. The problem is WhisperModel can't find or correctly process the audio files outside the code working directory.

def process_audios(self) -> bool:
        exts = ['*.m4a', '*.mp3', '*.wav', '*.flac', '*.mp4', '*.wma', '*.aac', '*.ogg']

        print(os.listdir(self.audio_path))
        # ['Tutorial-Master Text Similarity Search with Python & FAISS Vector Database.m4a', 'g30 4.m4a']

        for filename in os.listdir(self.audio_path):
            if any(fnmatch.fnmatch(filename, extension) for extension in exts):
                cur_file = os.path.join(self.audio_path, filename)  # Absolute path
                filename_extensionless = os.path.splitext(filename)[0]
                print('cur_file is: ', cur_file) # /Users/lmonteir/.HandySpeechBot/projects/project_name/audios/Tutorial-Master Text Similarity Search with Python & FAISS Vector Database.m4a
                print('is valid: ', os.path.isfile(cur_file))   # It says True

                model = WhisperModel(model_size_or_path=self.app_data['user_config']['model'],
                                     cpu_threads=self.app_data['user_config']['cpu_threads'],
                                     download_root=self.models_path)
                segments, info = model.transcribe(cur_file) # Error happens here.

This is the error stack:

Traceback (most recent call last):
  File "/Users/lmonteir/Projects/handy_speech_bot/DataManager/project_manager.py", line 139, in <module>
    m.process_audios()
  File "/Users/lmonteir/Projects/handy_speech_bot/DataManager/project_manager.py", line 97, in process_audios
    segments, info = model.transcribe(cur_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lmonteir/Projects/handy_speech_bot/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 294, in transcribe
    audio = decode_audio(audio, sampling_rate=sampling_rate)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lmonteir/Projects/handy_speech_bot/lib/python3.12/site-packages/faster_whisper/audio.py", line 52, in decode_audio
    for frame in frames:
  File "/Users/lmonteir/Projects/handy_speech_bot/lib/python3.12/site-packages/faster_whisper/audio.py", line 103, in _resample_frames
    for frame in itertools.chain(frames, [None]):
  File "/Users/lmonteir/Projects/handy_speech_bot/lib/python3.12/site-packages/faster_whisper/audio.py", line 92, in _group_frames
    fifo.write(frame)
  File "av/audio/fifo.pyx", line 30, in av.audio.fifo.AudioFifo.write
  File "av/audio/fifo.pyx", line 74, in av.audio.fifo.AudioFifo.write
RuntimeError: Could not allocate AVAudioFifo.

Now, if I put the files in the current script folder, it runs fine. I have tried putting double quotes between the filename and the absolute path, but I didn't work. Anything that I might be missing?

Purfview commented 4 months ago

Make sure you are using the last faster-whisper version. Check what PyAV version is there too.

NeoFahrenheit commented 4 months ago

faster-whisper is on 1.0.1. Couldn't find a package named PyAV. I installed the version 12.0.5. Problem persists.

Let me know if you need more information. :) Thanks for the help!

Purfview commented 4 months ago

Try to downgrade it, I don't have other ideas...

pip install --force-reinstall av==11.0.0

NeoFahrenheit commented 4 months ago

Try to downgrade it, I don't have other ideas...

pip install --force-reinstall av==11.0.0

It didn't work. What I tried was to use those generic audio converter websites to convert my .m4a to .mp3 and it worked nicely!

Now, this is what I dont understand. I can process local .m4a files with no problem, but not with absolute path. But .mp3 works fine with absolute path.

Maybe is there something related to my project? I changed my hugging face cache to a folder in /Users/lmonteir/.HandySpeechBot/models. It is a virtual env, created with python3 -m venv . The env is at /Users/lmonteir/Projects/.

I'm just confused, but now I have a workaround, which is nice.

Thank you for the support!