SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.2k stars 1.02k forks source link

Transcription error related to BatchedInferencePipeline and numpy #1102

Open shkstar opened 4 days ago

shkstar commented 4 days ago

I am using BatchedInferencePipeline of faster whisper in Google Colab by

! pip install --force-reinstall "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz" ! pip install ctranslate2==4.4.0

Today when I execute the transcription it showed below error msg:

[/usr/local/lib/python3.10/dist-packages/faster_whisper/transcribe.py](https://eq31t7k3e4m-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20241025-060057_RC00_689738598#) in transcribe(self, audio, language, task, beam_size, best_of, patience, length_penalty, repetition_penalty, no_repeat_ngram_size, temperature, compression_ratio_threshold, log_prob_threshold, log_prob_low_threshold, no_speech_threshold, condition_on_previous_text, prompt_reset_on_temperature, initial_prompt, prefix, suppress_blank, suppress_tokens, without_timestamps, max_initial_timestamp, word_timestamps, prepend_punctuations, append_punctuations, multilingual, output_language, vad_filter, vad_parameters, max_new_tokens, chunk_length, clip_timestamps, hallucination_silence_threshold, hotwords, language_detection_threshold, language_detection_segments)
    758             audio = torch.from_numpy(audio)
    759         elif not isinstance(audio, torch.Tensor):
--> 760             audio = decode_audio(audio, sampling_rate=sampling_rate)
    761 
    762         duration = audio.shape[0] / sampling_rate

[/usr/local/lib/python3.10/dist-packages/faster_whisper/audio.py](https://eq31t7k3e4m-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20241025-060057_RC00_689738598#) in decode_audio(input_file, sampling_rate, split_stereo)
     75         return torch.from_numpy(left_channel), torch.from_numpy(right_channel)
     76 
---> 77     return torch.from_numpy(audio)
     78 
     79 

TypeError: expected np.ndarray (got numpy.ndarray)

May I ask what is the problem and how to solve? It is weird that it used to work without problems.

MahmoudAshraf97 commented 4 days ago

please use a debugger and check the value of audio or upload the audio file here

shkstar commented 3 days ago

I turn youtube video to wav using ! pip install yt-

Type of video_path_local: <class 'str'> File exists: True File size: 22384718 Error processing 9ez8lm9I26Y.wav: expected np.ndarray (got numpy.ndarray)

https://app.box.com/s/okmln29g34hdkbsn5r8no7gbg0orb8ny