WhisperX说话人分离无法进行

sugar-ray34 commented 5 months ago

模型下载没有问题（按之前的帖子用管理员身份运行），不知道报Move Model To Device Error是什么原因，然后就一直卡在这一步了，GPU占用也一直满载，挂机一段时间也没有结果 ==========Speaker_Diarize==========

min_speaker: 1 max_speaker: 2

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize...

CheshireCC commented 5 months ago

版本？

sugar-ray34 commented 5 months ago

版本？

0.8.0

CheshireCC commented 5 months ago

版本？

0.8.0

你看看共享显存是不是也高占用，如果是的话，那就是显存不够了，启用内存给显卡共享了，导致速度慢，可以在使用 whisperX 之前卸载 fw 模型腾出显存

sugar-ray34 commented 5 months ago

版本？

0.8.0

你看看共享显存是不是也高占用，如果是的话，那就是显存不够了，启用内存给显卡共享了，导致速度慢，可以在使用 whisperX 之前卸载 fw 模型腾出显存

共享GPU内存才1.4/7.9G占用并不高，但确实专用显存占满了，fw模型卸载之后解决了，也就是说whisperX只能用专用显存跑？但跑出来两个说话人完全没有区分，200句里面只有10句做出了分别，我看日志之前的报错还是一样？

==========Speaker_Diarize==========

min_speaker: 1 max_speaker: 2

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize... process transcription result... speaker alignment... UPdata DataModel tab_F:/XXX updata table:F:/XXX

CheshireCC commented 5 months ago

显存本身不够的时候，才会使用共享显存，共享显存就会造成速度慢，说话人分离是基于聚类算法开发的深度学习模型，效果不一定准确，区分不够的话把最大、最小说话人都设置成 2 ，可能会好点儿

sugar-ray34 commented 5 months ago

显存本身不够的时候，才会使用共享显存，共享显存就会造成速度慢，说话人分离是基于聚类算法开发的深度学习模型，效果不一定准确，区分不够的话把最大、最小说话人都设置成 2 ，可能会好点儿

嗯，即使都设成2，说话人分离这个功能感觉还是属于不可用的状态。我试了几个素材，最多只能分辨男性跟女性声源的区别，同性间的对话完全无法分辨。不知道参数有没有优化的空间，感谢了！

CheshireCC commented 5 months ago

说话人分离这个功能可能就是这样了，聚类模型本身的局限性

aquamarineaqua commented 4 months ago

为什么我点了 WhisperX说话人分离后没有任何反应呢？令牌也设置了

CheshireCC commented 4 months ago

为什么我点了 WhisperX说话人分离后没有任何反应呢？令牌也设置了

看看日志

aquamarineaqua commented 4 months ago

为什么我点了 WhisperX说话人分离后没有任何反应呢？令牌也设置了

看看日志

感谢回复，看了一下日志显示： Traceback (most recent call last): File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 1066, in whisperXDiarizeSpeakers File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 347, in getParamWhisperX ValueError: invalid literal for int() with base 10: '〡'

说起来我的人声设置界面的数字是长这样子的（不仅如此，设置的很多其他地方都是长这样子的），有什么方法可以显示为阿拉伯数字吗？ issue

aquamarineaqua commented 4 months ago

为什么我点了 WhisperX说话人分离后没有任何反应呢？令牌也设置了

看看日志

感谢回复，看了一下日志显示： Traceback (most recent call last): File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 1066, in whisperXDiarizeSpeakers File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 347, in getParamWhisperX ValueError: invalid literal for int() with base 10: '〡'

说起来我的人声设置界面的数字是长这样子的（不仅如此，设置的很多其他地方都是长这样子的），有什么方法可以显示为阿拉伯数字吗？

已解决显示阿拉伯数字的问题。现在显示的错误是：

min_speaker: 1 max_speaker: 3

Speaker diarize and alignment load speaker brain model...

Could not download 'pyannote/segmentation' model. It might be because the model is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with:

Model.from_pretrained('pyannote/segmentation', ... use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the model is gated: visit https://hf.co/pyannote/segmentation to accept the user conditions. failed to diarize speaker! Error: 'NoneType' object has no attribute 'eval' UPdata DataModel

但是我已经设置了令牌和梯子了

CheshireCC commented 4 months ago

为什么我点了 WhisperX说话人分离后没有任何反应呢？令牌也设置了

看看日志

感谢回复，看了一下日志显示： Traceback (most recent call last): File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 1066, in whisperXDiarizeSpeakers File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 347, in getParamWhisperX ValueError: invalid literal for int() with base 10: '〡' 说起来我的人声设置界面的数字是长这样子的（不仅如此，设置的很多其他地方都是长这样子的），有什么方法可以显示为阿拉伯数字吗？

已解决显示阿拉伯数字的问题。现在显示的错误是：

min_speaker: 1 max_speaker: 3

Speaker diarize and alignment load speaker brain model...

Could not download 'pyannote/segmentation' model. It might be because the model is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with:

Model.from_pretrained('pyannote/segmentation', ... use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the model is gated: visit https://hf.co/pyannote/segmentation to accept the user conditions. failed to diarize speaker! Error: 'NoneType' object has no attribute 'eval' UPdata DataModel

但是我已经设置了令牌和梯子了

Huggingface 是容易出这样的问题，去　HuggingFace 删掉现在的令牌，然后重新设置新令牌试试吧，有的时候确实会出现这个问题

etkane commented 2 months ago

版本？

0.8.0

你看看共享显存是不是也高占用，如果是的话，那就是显存不够了，启用内存给显卡共享了，导致速度慢，可以在使用 whisperX 之前卸载 fw 模型腾出显存

卸载模型似乎无效，具体参见下方，感觉提示是后台命令少了个TO ?? 版本0.8.0，（0.8.1爆显存不知为何，导致只能使用0.8.0）（显卡是4060 8G)

==========2024-09-18_22:06:30========== ==========Unload Whisper Model==========

Temperature: 0.0,0.2,0.4,0.6,0.8,1.0 and transcript has already been run Temperature fallback configuration may take effect, that may take crash when unload model from memory! unload model succeed

==========2024-09-18_22:06:41========== ==========Speaker_Diarize==========

min_speaker: 0 max_speaker: 4

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize... process transcription result... speaker alignment... UPdata DataModel tab_C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a updata table:C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a

==========2024-09-18_22:07:36========== ==========Speaker_Diarize==========

min_speaker: 0 max_speaker: 4

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize... process transcription result... speaker alignment... UPdata DataModel tab_C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a updata table:C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a

etkane commented 2 months ago

显存本身不够的时候，才会使用共享显存，共享显存就会造成速度慢，说话人分离是基于聚类算法开发的深度学习模型，效果不一定准确，区分不够的话把最大、最小说话人都设置成 2 ，可能会好点儿

https://github.com/pyannote/pyannote-pipeline/issues/48 有一个同样问题解决的贴子，但是没看懂。。。

etkane commented 2 months ago

需要去huggingface同意两个协议，具体见：https://huggingface.co/pyannote/speaker-diarization-3.1 0.8.0应该没有集成一些东西，但是开发者在开发过程中可能下载或加载了？估计是这个：speaker-diarization 我不会python，再研究下吧。。。。

CheshireCC / faster-whisper-GUI

WhisperX说话人分离无法进行 #161