MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.28k stars 272 forks source link

Issues with silence #38

Closed corneliusgerico closed 1 year ago

corneliusgerico commented 1 year ago

I have a lot of files that error out because of silences which aren't of significant lengths - a couple of seconds or so.

I know Whisper has issues with silence and there is a VAD built in, but is there any way to improve this other than having to pre-process it?

Thanks

MahmoudAshraf97 commented 1 year ago

What are the natures of the errors exactly? Is it from whisper or nemo?

On Fri, May 5, 2023, 3:41 PM corneliusgerico @.***> wrote:

I have a lot of files that error out because of silences which aren't of significant lengths.

I know Whisper with silence has issues and there is a VAD built in, but is there any way to improve this other than having to pre-process it?

Thanks

— Reply to this email directly, view it on GitHub https://github.com/MahmoudAshraf97/whisper-diarization/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHGLDDRYQUR6SWSA5ANBLXETYRLANCNFSM6AAAAAAXXCDWVE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

corneliusgerico commented 1 year ago

I just tested a file in the notebook to see what the error is and it worked...

However, when I run through diarize_parallel, it starts to process but then just does nothing and resets to the default CMD state ie C:\Users\admin\Downloads\whisper-diarization-main (2)\whisper-diarization-main>

As for diarize, it gives the usual warnings:

[NeMo W 2023-05-05 23:34:57 optimizers:54] Apex was not found. Using the lamb or fused_adam optimizer will error out. [NeMo W 2023-05-05 23:34:58 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.

And then it resets as above.

I'm able to fix the problem by truncating silences and sometimes also having to normalise the audio.

So it seems to be something about the silence that is causing the issue.

MahmoudAshraf97 commented 1 year ago

If the non-parallel diarization works then it's a bug with the parallel one, I doubt the silence is related because ideally both files should give exactly the same result

On Fri, May 5, 2023, 4:41 PM corneliusgerico @.***> wrote:

I just tested a file in the notebook to see what the error is and it worked...

However, when I run through diarize_parallel, it starts to process but then just does nothing and resets to the default CMD state ie C:\Users\admin\Downloads\whisper-diarization-main (2)\whisper-diarization-main>

As for diarize, it gives the usual warnings:

[NeMo W 2023-05-05 23:34:57 optimizers:54] Apex was not found. Using the lamb or fused_adam optimizer will error out. [NeMo W 2023-05-05 23:34:58 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.

And then it resets as above.

I'm able to fix the problem by truncating silences and sometimes also having to normalise the audio.

So it seems to be something about the silence that is causing the issue.

— Reply to this email directly, view it on GitHub https://github.com/MahmoudAshraf97/whisper-diarization/issues/38#issuecomment-1536281450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHGLAIOGIYNHKDY4W53STXET7SNANCNFSM6AAAAAAXXCDWVE . You are receiving this because you commented.Message ID: @.***>

corneliusgerico commented 1 year ago

It isn't working with the normal diarize either. It resets CMD shortly after showing those warnings.

corneliusgerico commented 1 year ago

I've attached the output I got from running it through the notebook just now.

It happens at the transcribing from whisper and realigning stage.

Output.txt

MahmoudAshraf97 commented 1 year ago

this error seems to come from demucs, are you sure it's from whisper or alignment? if so can you send me the file to debug?