CheshireCC / faster-whisper-GUI

faster_whisper GUI with PySide6
GNU Affero General Public License v3.0
1.69k stars 104 forks source link

WhisperX说话人分离无法进行 #161

Open sugar-ray34 opened 5 months ago

sugar-ray34 commented 5 months ago

模型下载没有问题(按之前的帖子用管理员身份运行),不知道报Move Model To Device Error是什么原因,然后就一直卡在这一步了,GPU占用也一直满载,挂机一段时间也没有结果 ==========Speaker_Diarize==========

min_speaker: 1 max_speaker: 2

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize...

CheshireCC commented 5 months ago

版本?

sugar-ray34 commented 5 months ago

版本?

0.8.0

CheshireCC commented 5 months ago

版本?

0.8.0

你看看共享显存是不是也高占用,如果是的话,那就是显存不够了,启用内存给显卡共享了,导致速度慢,可以在使用 whisperX 之前卸载 fw 模型腾出显存

sugar-ray34 commented 5 months ago

版本?

0.8.0

你看看共享显存是不是也高占用,如果是的话,那就是显存不够了,启用内存给显卡共享了,导致速度慢,可以在使用 whisperX 之前卸载 fw 模型腾出显存

共享GPU内存才1.4/7.9G占用并不高,但确实专用显存占满了,fw模型卸载之后解决了,也就是说whisperX只能用专用显存跑?但跑出来两个说话人完全没有区分,200句里面只有10句做出了分别,我看日志之前的报错还是一样?

==========Speaker_Diarize==========

min_speaker: 1 max_speaker: 2

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize... process transcription result... speaker alignment... UPdata DataModel tab_F:/XXX updata table:F:/XXX

CheshireCC commented 5 months ago

显存本身不够的时候,才会使用共享显存,共享显存就会造成速度慢,说话人分离是基于聚类算法开发的深度学习模型,效果不一定准确,区分不够的话把最大、最小说话人都设置成 2 ,可能会好点儿

sugar-ray34 commented 5 months ago

显存本身不够的时候,才会使用共享显存,共享显存就会造成速度慢,说话人分离是基于聚类算法开发的深度学习模型,效果不一定准确,区分不够的话把最大、最小说话人都设置成 2 ,可能会好点儿

嗯,即使都设成2,说话人分离这个功能感觉还是属于不可用的状态。我试了几个素材,最多只能分辨男性跟女性声源的区别,同性间的对话完全无法分辨。不知道参数有没有优化的空间,感谢了!

CheshireCC commented 5 months ago

说话人分离这个功能可能就是这样了,聚类模型本身的局限性

aquamarineaqua commented 4 months ago

为什么我点了 WhisperX说话人分离 后没有任何反应呢?令牌也设置了

CheshireCC commented 4 months ago

为什么我点了 WhisperX说话人分离 后没有任何反应呢?令牌也设置了

看看日志

aquamarineaqua commented 4 months ago

为什么我点了 WhisperX说话人分离 后没有任何反应呢?令牌也设置了

看看日志

感谢回复,看了一下日志显示: Traceback (most recent call last): File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 1066, in whisperXDiarizeSpeakers File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 347, in getParamWhisperX ValueError: invalid literal for int() with base 10: '〡'

说起来我的人声设置界面的数字是长这样子的(不仅如此,设置的很多其他地方都是长这样子的),有什么方法可以显示为阿拉伯数字吗? issue

aquamarineaqua commented 4 months ago

为什么我点了 WhisperX说话人分离 后没有任何反应呢?令牌也设置了

看看日志

感谢回复,看了一下日志显示: Traceback (most recent call last): File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 1066, in whisperXDiarizeSpeakers File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 347, in getParamWhisperX ValueError: invalid literal for int() with base 10: '〡'

说起来我的人声设置界面的数字是长这样子的(不仅如此,设置的很多其他地方都是长这样子的),有什么方法可以显示为阿拉伯数字吗? issue

已解决显示阿拉伯数字的问题。 现在显示的错误是:

min_speaker: 1 max_speaker: 3

Speaker diarize and alignment load speaker brain model...

Could not download 'pyannote/segmentation' model. It might be because the model is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with:

Model.from_pretrained('pyannote/segmentation', ... use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the model is gated: visit https://hf.co/pyannote/segmentation to accept the user conditions. failed to diarize speaker! Error: 'NoneType' object has no attribute 'eval' UPdata DataModel

但是我已经设置了令牌和梯子了

CheshireCC commented 4 months ago

为什么我点了 WhisperX说话人分离 后没有任何反应呢?令牌也设置了

看看日志

感谢回复,看了一下日志显示: Traceback (most recent call last): File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 1066, in whisperXDiarizeSpeakers File "D:\ProgramData\FasterWhisperGUI\faster_whisper_GUI\mainWindows.py", line 347, in getParamWhisperX ValueError: invalid literal for int() with base 10: '〡' 说起来我的人声设置界面的数字是长这样子的(不仅如此,设置的很多其他地方都是长这样子的),有什么方法可以显示为阿拉伯数字吗? issue

已解决显示阿拉伯数字的问题。 现在显示的错误是:

min_speaker: 1 max_speaker: 3

Speaker diarize and alignment load speaker brain model...

Could not download 'pyannote/segmentation' model. It might be because the model is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with:

Model.from_pretrained('pyannote/segmentation', ... use_auth_token=YOUR_AUTH_TOKEN)

If this still does not work, it might be because the model is gated: visit https://hf.co/pyannote/segmentation to accept the user conditions. failed to diarize speaker! Error: 'NoneType' object has no attribute 'eval' UPdata DataModel

但是我已经设置了令牌和梯子了

Huggingface 是容易出这样的问题,去 HuggingFace 删掉现在的令牌,然后重新设置新令牌试试吧,有的时候确实会出现这个问题

etkane commented 2 months ago

版本?

0.8.0

你看看共享显存是不是也高占用,如果是的话,那就是显存不够了,启用内存给显卡共享了,导致速度慢,可以在使用 whisperX 之前卸载 fw 模型腾出显存

卸载模型似乎无效,具体参见下方,感觉提示是后台命令少了个TO ?? 版本0.8.0,(0.8.1爆显存不知为何,导致只能使用0.8.0)(显卡是4060 8G)

==========2024-09-18_22:06:30========== ==========Unload Whisper Model==========

Temperature: 0.0,0.2,0.4,0.6,0.8,1.0 and transcript has already been run Temperature fallback configuration may take effect, that may take crash when unload model from memory! unload model succeed

==========2024-09-18_22:06:41========== ==========Speaker_Diarize==========

min_speaker: 0 max_speaker: 4

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize... process transcription result... speaker alignment... UPdata DataModel tab_C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a updata table:C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a

==========2024-09-18_22:07:36========== ==========Speaker_Diarize==========

min_speaker: 0 max_speaker: 4

Speaker diarize and alignment load speaker brain model... Move Model To Device Error: 'SpeakerDiarization' object has no attribute 'to' speaker diarize... process transcription result... speaker alignment... UPdata DataModel tab_C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a updata table:C:/Users/bican/Desktop/aitest/去电_安诚保险_95544_202409141931.m4a

etkane commented 2 months ago

显存本身不够的时候,才会使用共享显存,共享显存就会造成速度慢,说话人分离是基于聚类算法开发的深度学习模型,效果不一定准确,区分不够的话把最大、最小说话人都设置成 2 ,可能会好点儿

https://github.com/pyannote/pyannote-pipeline/issues/48 有一个同样问题解决的贴子,但是没看懂。。。

etkane commented 2 months ago

需要去huggingface同意两个协议, 具体见:https://huggingface.co/pyannote/speaker-diarization-3.1 0.8.0应该没有集成一些东西,但是开发者在开发过程中可能下载或加载了? 估计是这个:speaker-diarization 我不会python,再研究下吧。。。。