/home/.........../token_classification.py:168: UserWarning: `grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="none"` instead.
warnings.warn(
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/....../whisper-diarization/venv/diarize_parallel.py:137 in │
│ <module> │
│ │
│ 134 │ │
│ 135 │ words_list = list(map(lambda x: x["word"], wsm)) │
│ 136 │ │
│ ❱ 137 │ labled_words = punct_model.predict(words_list) │
│ 138 │ │
│ 139 │ ending_puncts = ".?!" │
│ 140 │ model_puncts = ".,?-:" │
│ │
│ /home/.....//whisper-diarization/venv/ython3.10/site-packages/deepmultilingualpunctuation/pun │
│ ctuationmodel.py:39 in predict │
│ │
│ 36 │ │ │
│ 37 │ │ # if the last batch is smaller than the overlap, │
│ 38 │ │ # we can just remove it │
│ ❱ 39 │ │ if len(batches[-1]) <= overlap: │
│ 40 │ │ │ batches.pop() │
│ 41 │ │ │
│ 42 │ │ tagged_words = [] │
╰──────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range
I tried throwing a try block around len(batches[-1] <= overlap, and threw in len(baches)[0] <= overlap to boot (not great at this programming thing really and still learning) in the punctuationmodel.py file and was able to successfully generate srt files / transcriptions for a couple audio files I was working with,but then it came back.
Hello, this problem seems to be with the punctuation model code which I don't have access to or mainain, it's better if you open this issue on their repo
I tried throwing a try block around len(batches[-1] <= overlap, and threw in len(baches)[0] <= overlap to boot (not great at this programming thing really and still learning) in the punctuationmodel.py file and was able to successfully generate srt files / transcriptions for a couple audio files I was working with,but then it came back.
Hope this is helpful!