Speaker Diarization goes haywire due to small segments of audio

Describe the bug

I have a long audio of around 3 hours that spans multiple speakers. The speaker diarization label a single speaker when this audio is passed. When I break down into this audio in parts and pass each part separately, some of the parts get assigned speakers correctly but the rest of the portion has the same bug. I identified some 1 min chunks that when added in this audio cause the model to behave this way. I'm seeking possible explanations or solutions to this behavior since I believe that the model should be resilient enough.

Steps/Code to reproduce bug

Test Speaker Diarization on the audio

Expected behavior

A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

Environment location: AWS
Method of NeMo install: pip install

Environment details

AWS Linux 2
PyTorch version: 2.3.1
Python version: 3.10

Additional context

GPU model

NVIDIA / NeMo

Speaker Diarization goes haywire due to small segments of audio #9523