MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.64k stars 322 forks source link

I'm getting an error #4

Closed ubanning closed 1 year ago

ubanning commented 1 year ago

Hello, first of all, thanks for the project. I installed the requirements and ran the following command as you said: python diarize.py -a AUDIO_FILE_NAME I am getting the following error: Can you help me? Thanks.

│ /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:10 │
│ 02 in _get_module                                                            │
│                                                                              │
│    999 │                                                                     │
│   1000 │   def _get_module(self, module_name: str):                          │
│   1001 │   │   try:                                                          │
│ ❱ 1002 │   │   │   return importlib.import_module("." + module_name, self.__ │
│   1003 │   │   except Exception as e:                                        │
│   1004 │   │   │   raise RuntimeError(                                       │
│   1005 │   │   │   │   f"Failed to import {self.__name__}.{module_name} beca │
│                                                                              │
│ /usr/lib/python3.8/importlib/__init__.py:127 in import_module                │
│                                                                              │
│   124 │   │   │   if character != '.':                                       │
│   125 │   │   │   │   break                                                  │
│   126 │   │   │   level += 1                                                 │
│ ❱ 127 │   return _bootstrap._gcd_import(name[level:], package, level)        │
│   128                                                                        │
│   129                                                                        │
│   130 _RELOADING = {}                                                        │
│ <frozen importlib._bootstrap>:1014 in _gcd_import                            │
│ <frozen importlib._bootstrap>:991 in _find_and_load                          │
│ <frozen importlib._bootstrap>:975 in _find_and_load_unlocked                 │
│ <frozen importlib._bootstrap>:671 in _load_unlocked                          │
│ <frozen importlib._bootstrap_external>:848 in exec_module                    │
│ <frozen importlib._bootstrap>:219 in _call_with_frames_removed               │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/models/xlm_roberta/model │
│ ing_tf_xlm_roberta.py:19 in <module>                                         │
│                                                                              │
│    16 """ TF 2.0 XLM-RoBERTa model."""                                       │
│    17                                                                        │
│    18 from ...utils import add_start_docstrings, logging                     │
│ ❱  19 from ..roberta.modeling_tf_roberta import (                            │
│    20 │   TFRobertaForCausalLM,                                              │
│    21 │   TFRobertaForMaskedLM,                                              │
│    22 │   TFRobertaForMultipleChoice,                                        │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/models/roberta/modeling_ │
│ tf_roberta.py:36 in <module>                                                 │
│                                                                              │
│     33 │   TFSequenceClassifierOutput,                                       │
│     34 │   TFTokenClassifierOutput,                                          │
│     35 )                                                                     │
│ ❱   36 from ...modeling_tf_utils import (                                    │
│     37 │   TFCausalLanguageModelingLoss,                                     │
│     38 │   TFMaskedLanguageModelingLoss,                                     │
│     39 │   TFModelInputType,                                                 │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/modeling_tf_utils.py:38  │
│ in <module>                                                                  │
│                                                                              │
│     35 from tensorflow.python.keras.saving import hdf5_format                │
│     36                                                                       │
│     37 from huggingface_hub import Repository, list_repo_files               │
│ ❱   38 from keras.saving.hdf5_format import save_attributes_to_hdf5_group    │
│     39 from requests import HTTPError                                        │
│     40 from transformers.utils.hub import convert_file_size_to_int, get_chec │
│     41                                                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'keras.saving.hdf5_format'

The above exception was the direct cause of the following exception:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/whisper-diarization/diarize.py:145 in <module>                      │
│                                                                              │
│   142                                                                        │
│   143 if whisper_results["language"] in punct_model_langs:                   │
│   144 │   # restoring punctuation in the transcript to help realign the sent │
│ ❱ 145 │   punct_model = PunctuationModel(model="kredor/punctuate-all")       │
│   146 │                                                                      │
│   147 │   words_list = list(map(lambda x: x["word"], wsm))                   │
│   148                                                                        │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/deepmultilingualpunctuation/punctuati │
│ onmodel.py:9 in __init__                                                     │
│                                                                              │
│    6 class PunctuationModel():                                               │
│    7 │   def __init__(self, model = "oliverguhr/fullstop-punctuation-multila │
│    8 │   │   if torch.cuda.is_available():                                   │
│ ❱  9 │   │   │   self.pipe = pipeline("ner",model, grouped_entities=False, d │
│   10 │   │   else:                                                           │
│   11 │   │   │   self.pipe = pipeline("ner",model, grouped_entities=False)   │
│   12                                                                         │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/pipelines/__init__.py:65 │
│ 0 in pipeline                                                                │
│                                                                              │
│   647 │   # Forced if framework already defined, inferred if it's None       │
│   648 │   # Will load the correct model if possible                          │
│   649 │   model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["p │
│ ❱ 650 │   framework, model = infer_framework_load_model(                     │
│   651 │   │   model,                                                         │
│   652 │   │   model_classes=model_classes,                                   │
│   653 │   │   config=config,                                                 │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py:233 in │
│ infer_framework_load_model                                                   │
│                                                                              │
│    230 │   │   │   │   │   if _class is not None:                            │
│    231 │   │   │   │   │   │   classes.append(_class)                        │
│    232 │   │   │   │   if look_tf:                                           │
│ ❱  233 │   │   │   │   │   _class = getattr(transformers_module, f"TF{archit │
│    234 │   │   │   │   │   if _class is not None:                            │
│    235 │   │   │   │   │   │   classes.append(_class)                        │
│    236 │   │   │   class_tuple = class_tuple + tuple(classes)                │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:99 │
│ 3 in __getattr__                                                             │
│                                                                              │
│    990 │   │   │   value = self._get_module(name)                            │
│    991 │   │   elif name in self._class_to_module.keys():                    │
│    992 │   │   │   module = self._get_module(self._class_to_module[name])    │
│ ❱  993 │   │   │   value = getattr(module, name)                             │
│    994 │   │   else:                                                         │
│    995 │   │   │   raise AttributeError(f"module {self.__name__} has no attr │
│    996                                                                       │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:99 │
│ 2 in __getattr__                                                             │
│                                                                              │
│    989 │   │   if name in self._modules:                                     │
│    990 │   │   │   value = self._get_module(name)                            │
│    991 │   │   elif name in self._class_to_module.keys():                    │
│ ❱  992 │   │   │   module = self._get_module(self._class_to_module[name])    │
│    993 │   │   │   value = getattr(module, name)                             │
│    994 │   │   else:                                                         │
│    995 │   │   │   raise AttributeError(f"module {self.__name__} has no attr │
│                                                                              │
│ /usr/local/lib/python3.8/dist-packages/transformers/utils/import_utils.py:10 │
│ 04 in _get_module                                                            │
│                                                                              │
│   1001 │   │   try:                                                          │
│   1002 │   │   │   return importlib.import_module("." + module_name, self.__ │
│   1003 │   │   except Exception as e:                                        │
│ ❱ 1004 │   │   │   raise RuntimeError(                                       │
│   1005 │   │   │   │   f"Failed to import {self.__name__}.{module_name} beca │
│   1006 │   │   │   │   f" traceback):\n{e}"                                  │
│   1007 │   │   │   ) from e                                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Failed to import 
transformers.models.xlm_roberta.modeling_tf_xlm_roberta because of the following
error (look up to see its traceback):
No module named 'keras.saving.hdf5_format'
MahmoudAshraf97 commented 1 year ago

Please make sure that transformers library is up to date pip install -U transformers to upgrade.

ubanning commented 1 year ago

Hello, now it doesn't even run, I'm getting this error right at the beginning:

Traceback (most recent call last):
  File "diarize.py", line 3, in <module>
    from helpers import *
  File "/content/whisper-diarization/helpers.py", line 2, in <module>
    import wget
ModuleNotFoundError: No module named 'wget'

If you have time, could you create a Google Colab? Thank you very much

MahmoudAshraf97 commented 1 year ago

use pip install wget, I'll create a colab notebook ASAP

ubanning commented 1 year ago

Ok, I tried to solve it to make it work in Google Colab, but I couldn't. When you can create it, let me know. Thanks :)