2024-09-27 06:36:06.521 | INFO | tools.step010_demucs_vr:load_model:21 - Loading Demucs model: htdemucs_ft
2024-09-27 06:36:06.522 | INFO | tools.step042_tts_xtts:load_model:24 - Loading TTS model from models/TTS/XTTS-v2
Loading TTS model from models/TTS/XTTS-v2
2024-09-27 06:36:06.536 | INFO | tools.step021_asr_whisperx:load_whisper_model:36 - Loading WhisperX model: models/ASR/whisper/faster-whisper-large-v3
Using model: xtts
Could not download 'pyannote/speaker-diarization-3.1' pipeline.
It might be because the pipeline is private or gated so make
sure to authenticate. Visit https://hf.co/settings/tokens to
create your access token and retry with:
If this still does not work, it might be because the pipeline is gated:
visit https://hf.co/pyannote/speaker-diarization-3.1 to accept the user conditions.
2024-09-27 06:36:06.934 | ERROR | tools.step021_asr_whisperx:load_diarize_model:71 - Failed to load diarization model in 0.40s due to 'NoneType' object has no attribute 'to'
2024-09-27 06:36:06.935 | INFO | tools.step021_asr_whisperx:load_diarize_model:72 - You have not set the HF_TOKEN, so the pyannote/speaker-diarization-3.1 model could not be downloaded.
2024-09-27 06:36:06.935 | INFO | tools.step021_asr_whisperx:load_diarize_model:73 - If you need to use the speaker diarization feature, please request access to the pyannote/speaker-diarization-3.1 model. Alternatively, you can choose not to enable this feature.
No language specified, language will be first be detected for each audio file (increases inference time).
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/torch/whisperx-vad-segmentation.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.3.1+cu121. Bad things might happen unless you revert torch to 1.x.2024-09-27 06:36:08.947 | INFO | tools.step021_asr_whisperx:load_whisper_model:43 - Loaded WhisperX model: models/ASR/whisper/faster-whisper-large-v3 in 2.41s
2024-09-27 06:36:13.090 | INFO | tools.step021_asr_whisperx:load_align_model:56 - Loaded alignment model: en in 4.14s
2024-09-27 06:36:20.921 | INFO | tools.step010_demucs_vr:load_model:25 - Demucs model loaded in 14.40 seconds
2024-09-27 06:36:21.762 | INFO | tools.step042_tts_xtts:load_model:35 - TTS model loaded in 15.24s
[BiliBili] Extracting URL: https://www.bilibili.com/video/BV1kr421M7vz/
[BiliBili] 1kr421M7vz: Downloading webpage
[BiliBili] BV1kr421M7vz: Extracting videos in anthology
2024-09-27 06:36:06.521 | INFO | tools.step010_demucs_vr:load_model:21 - Loading Demucs model: htdemucs_ft 2024-09-27 06:36:06.522 | INFO | tools.step042_tts_xtts:load_model:24 - Loading TTS model from models/TTS/XTTS-v2 Loading TTS model from models/TTS/XTTS-v2 2024-09-27 06:36:06.536 | INFO | tools.step021_asr_whisperx:load_whisper_model:36 - Loading WhisperX model: models/ASR/whisper/faster-whisper-large-v3
Could not download 'pyannote/speaker-diarization-3.1' pipeline. It might be because the pipeline is private or gated so make sure to authenticate. Visit https://hf.co/settings/tokens to create your access token and retry with:
If this still does not work, it might be because the pipeline is gated: visit https://hf.co/pyannote/speaker-diarization-3.1 to accept the user conditions. 2024-09-27 06:36:06.934 | ERROR | tools.step021_asr_whisperx:load_diarize_model:71 - Failed to load diarization model in 0.40s due to 'NoneType' object has no attribute 'to' 2024-09-27 06:36:06.935 | INFO | tools.step021_asr_whisperx:load_diarize_model:72 - You have not set the HF_TOKEN, so the pyannote/speaker-diarization-3.1 model could not be downloaded. 2024-09-27 06:36:06.935 | INFO | tools.step021_asr_whisperx:load_diarize_model:73 - If you need to use the speaker diarization feature, please request access to the pyannote/speaker-diarization-3.1 model. Alternatively, you can choose not to enable this feature. No language specified, language will be first be detected for each audio file (increases inference time). Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run
python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/torch/whisperx-vad-segmentation.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.3.1+cu121. Bad things might happen unless you revert torch to 1.x.2024-09-27 06:36:08.947 | INFO | tools.step021_asr_whisperx:load_whisper_model:43 - Loaded WhisperX model: models/ASR/whisper/faster-whisper-large-v3 in 2.41s 2024-09-27 06:36:13.090 | INFO | tools.step021_asr_whisperx:load_align_model:56 - Loaded alignment model: en in 4.14s 2024-09-27 06:36:20.921 | INFO | tools.step010_demucs_vr:load_model:25 - Demucs model loaded in 14.40 seconds 2024-09-27 06:36:21.762 | INFO | tools.step042_tts_xtts:load_model:35 - TTS model loaded in 15.24s [BiliBili] Extracting URL: https://www.bilibili.com/video/BV1kr421M7vz/ [BiliBili] 1kr421M7vz: Downloading webpage [BiliBili] BV1kr421M7vz: Extracting videos in anthology