MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

Update helpers.py #143

Closed ivankot88 closed 7 months ago

ivankot88 commented 7 months ago

Fix function _get_next_start_timestamp:

ivankot88 commented 7 months ago

Hi, i run your code on different audios and faced up with this error:

Traceback (most recent call last):
  File "/servant/jobs/resources/bt1llfnt03pk6lm727mq/diarize.py", line 192, in <module>
    main(args)
  File "/servant/jobs/resources/bt1llfnt03pk6lm727mq/diarize.py", line 143, in main
    transcription.generate_word_timestamps(result_aligned["word_segments"])
  File "/servant/jobs/resources/bt1llfnt03pk6lm727mq/models.py", line 30, in generate_word_timestamps
    self.word_timestamps = filter_missing_timestamps(result_aligned)
  File "/servant/jobs/resources/bt1llfnt03pk6lm727mq/helpers.py", line 382, in filter_missing_timestamps
    ws["end"] = _get_next_start_timestamp(word_timestamps, i)
  File "/servant/jobs/resources/bt1llfnt03pk6lm727mq/helpers.py", line 353, in _get_next_start_timestamp
    if word_timestamps[next_word_index].get("start") is None:
IndexError: list index out of range

When I looked the structure of the _get_next_start_timestamp function, I found that the while condition never works.

MahmoudAshraf97 commented 7 months ago

Hello, this fix introduces other errors when the last word also has no timestamps, i've fixed this and other issues at 9c0ab3c, can you check?

jonsampson commented 7 months ago

Hello @MahmoudAshraf97 ! Thank you for all your work on this project.

This didn't seem to be worth a pull request, but I can put one in if you'd like re: 9c0ab3c1882d5481628e9b0471b0d4e646f31f2e

As committed there is some misalignment between the naming of initial_offset ( diarize.py, diarize_parallel.py ) and initial_timestamp helpers.py. I chose to fix this by changing diarize.py and diarize_parallel.py to rename the parameter to initial_timestamp to maintain consistency with final_timestamp.

MahmoudAshraf97 commented 7 months ago

Thanks for noticing that @jonsampson , fixed in 39572386eb4170fc16440b770666f23ccf9bdc80

ivankot88 commented 7 months ago

Hello @MahmoudAshraf97, yes, you're right. Thank you for fix this problem.