TimStaeheli commented 4 days ago

I have a problem with the transcription. First it just gave us an empty output, now there are different problems. I go more in detail what i tried and what it looks like at the moment. This is our Transcript script and we use jubiterhub.

import openwillis as ow import os import json import whisperx

json_output_path = "/mnt/nfs/data/Test/Test_data/output_test/output_transcript_json/transcript_expressive.json"

text_output_path = "/mnt/nfs/data/Test/Test_data/output_test/output_transcript_text/transcript_expressive.txt"

os.makedirs(os.path.dirname(json_output_path), exist_ok=True)

os.makedirs(os.path.dirname(text_output_path), exist_ok=True)

transcript_json, transcript_text = ow.speech_transcription_whisper(filepath = '/mnt/nfs/data/Test/Test_data/output_test/multi_speaker.wav', model = 'tiny', batch_size = 16, hf_token = '', language = 'en', compute_type = 'float32', min_speakers = 1, max_speakers = 1 )

When i ran the script this time it told me that float16 was not supported. So changed it to float32. Than the float was working.

The next error message told me to run this command in the Terminal " 444 python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../../home/tstaehe@d.uzh.ch/.cache/torch/whisperx-vad-segmentation.bin" in order to upgrade the checkpoints of pythorch-lightnig. This also worked.

When i ran it next it showed me this error

message

INFO:root:Error in speech Transcription: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model. & File: /mnt/nfs/data/Test/Test_data/output_test/multi_speaker.wav so i changed to model from "large-v2" to "tiny" but that did not change anything. When restart the kernel and run the script this is the error message: 2024-11-26 12:09:53.426383: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1732619393.523056 40329 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1732619393.544711 40329 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-26 12:09:53.698330: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. INFO:speechbrain.utils.quirks:Applied quirks (see speechbrain.utils.quirks): [allow_tf32, disable_jit_profiling] INFO:speechbrain.utils.quirks:Excluded quirks specified by the SB_DISABLE_QUIRKS environment (comma-separated list): [] INFO:root:Error in speech Transcription: Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model. & File: /mnt/nfs/data/Test/Test_data/output_test/multi_speaker.wav No language specified, language will be first be detected for each audio file (increases inference time).

GeorgeEfstathiadis commented 4 days ago

This seems to be an error on WhisperX with some weird caching issue going on. I found this issue issue which seems to match your problem.

Try following that thread, emptying your torch cache (~/.cache/torch) and rerunning the function. Let me know if that fixes this error!

TimStaeheli commented 3 days ago

i ran this command in the terminal rm ~/.cache/torch/whisperx-vad-segmentation.bin to remove the corruped cache, which worked. But after i ran the script again a new error message came: INFO:root:Error in speech Transcription: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)> & File: /mnt/nfs/data/Test/Test_data/output_test/multi_speaker.wav No language specified, language will be first be detected for each audio file (increases inference time).

GeorgeEfstathiadis commented 3 days ago

If you are using a Mac, this is apparently an issue found often times on MacOS and they reference a few solutions you can try in this thread.

It looks like something is breaking with your environment and certificates for accessing webpages. If that's the case I would follow this thread re-install ca-certificates and certifi, up to step 2.1. If you are working from within a cluster/university server this might have to do with the server/vpn blocking access to one of the functions components accessing the web. If the above suggestions don't work you could also ask the maintainers of the cluster about this since it might be an issue with access permissions.

What OS are you working with? And also what Python version are you running?

TimStaeheli commented 3 days ago

we work on a virtual maschine, that is running on linux. And we are using Python 3.10.13

GeorgeEfstathiadis commented 3 days ago

Sounds good try recreating a virtual environment and follow installation for openwillis and then whisperx. If you get the error again, I would suggest following the second thread I linked in the previous message.

TimStaeheli commented 3 days ago

we fixed the SSL problem but there is a new error message that we cant explain.

INFO:pytorch_lightning.utilities.migration.utils:Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.3.3. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../../home/tstaehe@d.uzh.ch/.cache/torch/whisperx-vad-segmentation.bin No language specified, language will be first be detected for each audio file (increases inference time). Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu117. Bad things might happen unless you revert torch to 1.x. INFO:root:Error in speech Transcription: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on. & File: /mnt/nfs/data/Test/Test_data/output_test/multi_speaker.wav

If i run the command as it says me the error regarding SHA256 checksum reapears so thats like a loop. And the second part is wierd, because i checked the internet connection and it is good. The file path exists so i cant explain it.

GeorgeEfstathiadis commented 2 days ago

The message you're getting looks ominous, but it's mostly a warning. You don't need to follow the command it specifies.

Instead you need to check you have access to these 2 models in huggingface: Segmentation-3.0 and Speaker-Diarization-3.1

TimStaeheli commented 2 days ago

i did not have access to one of them but now i do and the same error message apeared.

GeorgeEfstathiadis commented 2 days ago

This really should have solved this error. I would suggest re-starting python, clearing the huggingface cache (~/. cache/huggingface/hub) and rerunning the script.

If that doesn't work I would suggest following the instructions/tutorial of whisperx to ensure you are able to run that. You can find this in the README of the whisperx repo, but I also highlight the code here to attempt if openwillis still doesn't work.

import whisperx
import gc 

device = "cuda" 
audio_file = "audio.mp3"
batch_size = 16 # reduce if low on GPU mem
compute_type = "float16" # change to "int8" if low on GPU mem (may reduce accuracy)

# 1. Transcribe with original whisper (batched)
model = whisperx.load_model("large-v2", device, compute_type=compute_type)

# save model to local path (optional)
# model_dir = "/path/"
# model = whisperx.load_model("large-v2", device, compute_type=compute_type, download_root=model_dir)

audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"]) # before alignment

# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model

# 2. Align whisper output
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)

print(result["segments"]) # after alignment

# delete model if low on GPU resources
# import gc; gc.collect(); torch.cuda.empty_cache(); del model_a

# 3. Assign speaker labels
diarize_model = whisperx.DiarizationPipeline(use_auth_token=YOUR_HF_TOKEN, device=device)

# add min/max number of speakers if known
diarize_segments = diarize_model(audio)
# diarize_model(audio, min_speakers=min_speakers, max_speakers=max_speakers)

result = whisperx.assign_word_speakers(diarize_segments, result)
print(diarize_segments)
print(result["segments"]) # segments are now assigned speaker IDs

TimStaeheli commented 2 days ago

thank you so much. It finaly worked, the only thing i had to add is device = "cpu", because for some reason when i would run it on cude, the jupiterhub kernel would crash but now it finaly works,

GeorgeEfstathiadis commented 19 hours ago

Awesome, glad it worked! Closing the issue

bklynhlth / openwillis

Error in Transcription #174

json_output_path = "/mnt/nfs/data/Test/Test_data/output_test/output_transcript_json/transcript_expressive.json"

text_output_path = "/mnt/nfs/data/Test/Test_data/output_test/output_transcript_text/transcript_expressive.txt"

os.makedirs(os.path.dirname(json_output_path), exist_ok=True)

os.makedirs(os.path.dirname(text_output_path), exist_ok=True)