Closed tophee closed 3 months ago
Can you upload the audio file to reproduce?
Unfortunately not this one. I can try to find one that I can share.
Are you suggesting the error is related to this specific audio file?
Yes, the error is an error in alignment script which completely depends on the generated transcription
On Wed, May 22, 2024, 8:40 PM Chris @.***> wrote:
Unfortunately not this one. I can try to find one that I can share.
Are you suggesting the error is related to this specific audio file?
— Reply to this email directly, view it on GitHub https://github.com/MahmoudAshraf97/whisper-diarization/issues/190#issuecomment-2125402797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHGLFQXAMPRLFR3SQAJV3ZDTKANAVCNFSM6AAAAABIECYQ52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVGQYDENZZG4 . You are receiving this because you commented.Message ID: @.***>
OK, I'm checking with another file, to start with. And i noticed that it says:
[NeMo W 2024-05-22 20:23:04 transformer_bpe_models:59] Could not import NeMo NLP collection which is required for speech translation model.
I'm not doing translation, so I assume this is not a problem, right?
Not a problem
I'm confused. I tried the above command on a different file twice and got two different errors, each different from the one reported above.
First time ended with
Suppressing numeral and symbol tokens
Traceback (most recent call last):
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1472, in _get_module
File "/opt/anaconda3/envs/pretzel/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers.models.wav2vec2_bert.configuration_wav2vec2_bert'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/xhxxch/whisper-dia/diarize.py", line 124, in <module>
alignment_model, alignment_tokenizer, alignment_dictionary = load_alignment_model(
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/ctc_forced_aligner/alignment_utils.py", line 276, in load_alignment_model
AutoModelForCTC.from_pretrained(
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 540, in from_pretrained
if kwargs_orig.get("quantization_config", None) is not None:
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 751, in keys
return getattribute_from_module(self._modules[module_name], attr)
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 752, in <listcomp>
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 748, in _load_attr_from_module
module_name = model_type_to_module_name(model_type)
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 692, in getattribute_from_module
return None
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1462, in __getattr__
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1474, in _get_module
RuntimeError: Failed to import transformers.models.wav2vec2_bert.configuration_wav2vec2_bert because of the following error (look up to see its traceback):
No module named 'transformers.models.wav2vec2_bert.configuration_wav2vec2_bert'
While the above process was executing I also did pip install 'nemo_toolkit[nlp]'
. Assuming that this may be the reason why I'm getting a different error, I did pip uninstall 'nemo_toolkit[nlp]'
and just to make sure that I still have what I need I did pip install 'nemo_toolkit[asr]'
again.
After that the very same command failed immediately with
objc[6072]: Class AVFFrameReceiver is implemented in both /opt/anaconda3/envs/pretzel/lib/libavdevice.58.8.100.dylib (0x1759f0798) and /opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x17860c760). One of the two will be used. Which one is undefined.
objc[6072]: Class AVFAudioReceiver is implemented in both /opt/anaconda3/envs/pretzel/lib/libavdevice.58.8.100.dylib (0x1759f07e8) and /opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/av/.dylibs/libavdevice.60.1.100.dylib (0x17860c7b0). One of the two will be used. Which one is undefined.
Traceback (most recent call last):
File "/Users/xhxxch/whisper-dia/diarize.py", line 3, in <module>
from helpers import (
File "/Users/xhxxch/whisper-dia/helpers.py", line 7, in <module>
from whisperx.alignment import DEFAULT_ALIGN_MODELS_HF, DEFAULT_ALIGN_MODELS_TORCH
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/__init__.py", line 1, in <module>
from .transcribe import load_model
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/transcribe.py", line 10, in <module>
from .asr import load_model
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/asr.py", line 13, in <module>
from .vad import load_vad_model, merge_chunks
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/whisperx/vad.py", line 11, in <module>
from pyannote.audio.pipelines import VoiceActivityDetection
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/__init__.py", line 26, in <module>
from .speaker_diarization import SpeakerDiarization
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 42, in <module>
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/pyannote/audio/pipelines/speaker_verification.py", line 56, in <module>
from nemo.collections.asr.models import (
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/__init__.py", line 15, in <module>
from nemo.collections.asr import data, losses, models, modules
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/models/__init__.py", line 36, in <module>
from nemo.collections.asr.models.transformer_bpe_models import EncDecTransfModelBPE
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/asr/models/transformer_bpe_models.py", line 52, in <module>
from nemo.collections.nlp.modules.common import TokenClassifier
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/__init__.py", line 15, in <module>
from nemo.collections.nlp import data, losses, models, modules
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/__init__.py", line 31, in <module>
from nemo.collections.nlp.models.machine_translation import MTEncDecModel
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/__init__.py", line 15, in <module>
from nemo.collections.nlp.models.machine_translation.mt_enc_dec_bottleneck_model import MTBottleneckModel
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/mt_enc_dec_bottleneck_model.py", line 23, in <module>
from nemo.collections.nlp.models.machine_translation.mt_enc_dec_model import MTEncDecModel
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py", line 38, in <module>
from nemo.collections.common.tokenizers.chinese_tokenizers import ChineseProcessor
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/nemo/collections/common/tokenizers/chinese_tokenizers.py", line 38, in <module>
import opencc
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/opencc.py", line 24, in <module>
libopencc = CDLL('libopencc.so.1', use_errno=True)
File "/opt/anaconda3/envs/pretzel/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(libopencc.so.1, 0x0006): tried: 'libopencc.so.1' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibopencc.so.1' (no such file), '/opt/anaconda3/envs/pretzel/lib/python3.10/lib-dynload/../../libopencc.so.1' (no such file), '/opt/anaconda3/envs/pretzel/bin/../lib/libopencc.so.1' (no such file), '/usr/lib/libopencc.so.1' (no such file, not in dyld cache), 'libopencc.so.1' (no such file), '/usr/local/lib/libopencc.so.1' (no such file), '/usr/lib/libopencc.so.1' (no such file, not in dyld cache)
Edit: I reinstalled the requirements (exceopt for nemo, which fails via the requirements.txt), but the error remains the same, no matter what audio file I use.
please reinstall ctc-forced-aligner
again, it needs to be recompiled with the torch version you are using, and upgrade transformers
to the latest version or atleast 4.34
or it's better to reinstall all the requirements
or it's better to reinstall all the requirements
I did, but that didn't change anything.
What seems to work (still executing, so far) is the solution mentioned in https://github.com/MahmoudAshraf97/whisper-diarization/issues/177#issuecomment-2097047524. I did
brew install opencc
ln -s /opt/homebrew/lib/libopencc.dylib libopencc.so.1
Now I'm waiting for the command to process to finish after Suppressing numeral and symbol tokens
What puzzles me, though is, why I oreviously (with the first testfile above) didn't get an error about libopencc.so.1
and now suddenly I did.
Edit: OK, we're back to where we were in the OP:
Suppressing numeral and symbol tokens
Some weights of the model checkpoint at MahmoudAshraf/mms-300m-1130-forced-aligner were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at MahmoudAshraf/mms-300m-1130-forced-aligner and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/Users/xhxxch/whisper-dia/diarize.py", line 155, in <module>
spans = get_spans(tokens_starred, segments, alignment_tokenizer.decode(blank_id))
File "/opt/anaconda3/envs/pretzel/lib/python3.10/site-packages/ctc_forced_aligner/alignment_utils.py", line 63, in get_spans
assert seg.label == ltr, f"{seg.label} != {ltr}"
AssertionError: g != <star>
But this is with a different audio file. So the error is not specific to one specific file. I'm suspecting it's not so much aboyút the audio file but about the language. You can probably take any Audio file in Swedish and reproduce the error.
Maybe this is related: As I am trying to understand how your script works, it looks like it is using a wav2vec2 model, just like whisperX which made me wonder how it works with Swedish audio, given that Swedish is not one of the languages for which whisperX already has a wav2vec2 model (when I tried whisperX I used KBLab/wav2vec2-large-voxrex-swedish
).
@tophee my script uses a multilingual alignment model, so if you changed the default model to a model which has the native vocabulary of the language you need to turn the romanization off too, can you upload the audio file to test as I have tried a Swedish audio and it worked fine with the default model
I am trying to to process a file in Swedish.
I'm using this command:
It runs ok for quite a while, but when it comes to the alignment part, it suddenly stops with a cryptic error (pasted below with some contex).
This is on a MacBook Pro M1, in case it matters.
Any hints that might help me understand (and possibly fix) the error is appreciated.