JarodMica / ai-voice-cloning

GNU General Public License v3.0
655 stars 144 forks source link

Bug: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')} #5

Closed jurandfantom closed 11 months ago

jurandfantom commented 11 months ago

Hi, sadly I found myself in situation when I can't train anymore. Yersteday things work like in tutorial, when today I get following errors (I redownloaded thing to check if that fix the issue). Nothing changed since yesterday in terms of installing things - only git pull of alltalk ooba extension but without messing up with requirements.txt

[Training] [2023-12-27T01:42:30.715504] 23-12-27 01:42:30.272 - INFO: Random seed: 1357
[Training] [2023-12-27T01:42:31.223594] 23-12-27 01:42:31.223 - INFO: Number of training data elements: 35, iters: 1
[Training] [2023-12-27T01:42:31.227594] 23-12-27 01:42:31.223 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-12-27T01:42:33.702266] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-12-27T01:42:33.706267]   warnings.warn(
[Training] [2023-12-27T01:42:40.969203] 23-12-27 01:42:40.969 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-12-27T01:42:41.496972] 23-12-27 01:42:41.495 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-12-27T01:42:42.987090] [2023-12-27 01:42:42,987] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-12-27T01:42:43.018090] [2023-12-27 01:42:43,018] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-12-27T01:42:45.144538] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
[Training] [2023-12-27T01:42:45.144538]   warn(
[Training] [2023-12-27T01:42:45.145538] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
[Training] [2023-12-27T01:42:45.145538]   warn(
[Training] [2023-12-27T01:42:47.291812] 23-12-27 01:42:47.290 - INFO: Saving models and training states.
[Training] [2023-12-27T01:42:47.291812] 23-12-27 01:42:47.290 - INFO: Finished training!
Alteregohr commented 11 months ago

Hi, I have the same problem. Yesterday everything worked, and now I can't train.

JarodMica commented 11 months ago

@jurandfantom and @Alteregohr

What GPU's do you guys have? This is probably something I overlooked for different GPUs when making the python runtime. If you wanna try a few modification, this might help me narrow down the issue.

The below are manual instructions on executing the script for bnb which is found here: https://git.ecker.tech/mrq/ai-voice-cloning/src/branch/master/setup-cuda-bnb.bat. This bat script won't run correctly for the package so we have to do it manually.

To follow along, it'll be easier to open up two file explorers, side by side as we're going to copy files from one to the other. I'm calling them Explorer A and Explorer B

Explorer A:

  1. Open up a file explorer to ai-voice-cloning/modules/bitsandbytes-windows/bin

Explorer B:

  1. Open up the other file explorer to ai-voice-cloning/runtime/Lib/site-packages/bitsandbytes

Drag and drop the two .dll files and cextension.py from Explorer A into Explorer B and confirm overwrite

Drag and drop cuda_setup from Explorer A into Explorer B and confirm overwrite

Drag and drop nn from Explorer A into Explorer B and confirm overwrite

I wasn't aware of this issue so I missed the setup-cuda-bnb.bat when manually making my runtime. If this resolves the issue, then this is probably it and I'll have to rebuild the package to upload a new one.

Thisisashan commented 11 months ago

I have this error. When I followed your instructions, and then try to train, I get

"[redirects not supported in windows](torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.)"

also "ModuleNotFoundError: No module named 'pyfastmp3decoder'"

Simply will not let me train voices.

jurandfantom commented 11 months ago

@jurandfantom and @Alteregohr

What GPU's do you guys have? This is probably something I overlooked for different GPUs when making the python runtime. If you wanna try a few modification, this might help me narrow down the issue.

Same as yours - Gigabyte RTX 4090.

After following instruction, same stuff as previous:

[Training] [2023-12-28T18:02:34.610768] 23-12-28 18:02:34.251 - INFO: Random seed: 8540 [Training] [2023-12-28T18:03:09.007772] 23-12-28 18:03:09.007 - INFO: Number of training data elements: 37, iters: 1 [Training] [2023-12-28T18:03:09.011772] 23-12-28 18:03:09.007 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-12-28T18:03:36.721792] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments. [Training] [2023-12-28T18:03:36.725793] warnings.warn( [Training] [2023-12-28T18:03:47.036902] 23-12-28 18:03:47.036 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-12-28T18:04:10.290853] 23-12-28 18:04:10.289 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-12-28T18:04:12.936852] [2023-12-28 18:04:12,936] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:15.062853] [2023-12-28 18:04:15,062] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Saving models and training states. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Finished training!

JarodMica commented 11 months ago

@jurandfantom and @Alteregohr What GPU's do you guys have? This is probably something I overlooked for different GPUs when making the python runtime. If you wanna try a few modification, this might help me narrow down the issue.

Same as yours - Gigabyte RTX 4090.

After following instruction, same stuff as previous:

[Training] [2023-12-28T18:02:34.610768] 23-12-28 18:02:34.251 - INFO: Random seed: 8540 [Training] [2023-12-28T18:03:09.007772] 23-12-28 18:03:09.007 - INFO: Number of training data elements: 37, iters: 1 [Training] [2023-12-28T18:03:09.011772] 23-12-28 18:03:09.007 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-12-28T18:03:36.721792] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments. [Training] [2023-12-28T18:03:36.725793] warnings.warn( [Training] [2023-12-28T18:03:47.036902] 23-12-28 18:03:47.036 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-12-28T18:04:10.290853] 23-12-28 18:04:10.289 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-12-28T18:04:12.936852] [2023-12-28 18:04:12,936] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:15.062853] [2023-12-28 18:04:15,062] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Saving models and training states. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Finished training!

Well from the stack here, I can see it did resolve the CUDA issue so that is gone now. I'll have to update the package with that. I see all of these normally when I train, including the redirect message so it's not a fatal error.

Can you try to redo the whisper portion, so re-curate the dataset and do it with 1 small audio file?

JarodMica commented 11 months ago

I have this error. When I followed your instructions, and then try to train, I get

"[redirects not supported in windows](torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.)"

also "ModuleNotFoundError: No module named 'pyfastmp3decoder'"

Simply will not let me train voices.

The redirects message is not fatal; it's normally there.

The latter however, suggests that you're using mp3 files. Can you convert these into .wav files and then rerun training? I took a look through the code and training with mp3 files is not supported in it's current configuration right now.

Thisisashan commented 11 months ago

Interestingly it had no issue transcribing from the MP3 files. Converted them to WAV. Tried in whisper/base.
Successfully training on 45 min of audio now. Thank you.

JarodMica commented 11 months ago

Interestingly it had no issue transcribing from the MP3 files. Converted them to WAV. Tried in whisper/base. Successfully training on 45 min of audio now. Thank you.

Whisper, the transcriber, can process all audio types, whereas DLAS, the trainer, is only setup for WAV. I'm sure it could do mp3s, but its not enabled so that's what is happening here.

jurandfantom commented 11 months ago

@jurandfantom and @Alteregohr What GPU's do you guys have? This is probably something I overlooked for different GPUs when making the python runtime. If you wanna try a few modification, this might help me narrow down the issue.

Same as yours - Gigabyte RTX 4090. After following instruction, same stuff as previous: [Training] [2023-12-28T18:02:34.610768] 23-12-28 18:02:34.251 - INFO: Random seed: 8540 [Training] [2023-12-28T18:03:09.007772] 23-12-28 18:03:09.007 - INFO: Number of training data elements: 37, iters: 1 [Training] [2023-12-28T18:03:09.011772] 23-12-28 18:03:09.007 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-12-28T18:03:36.721792] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments. [Training] [2023-12-28T18:03:36.725793] warnings.warn( [Training] [2023-12-28T18:03:47.036902] 23-12-28 18:03:47.036 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-12-28T18:04:10.290853] 23-12-28 18:04:10.289 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-12-28T18:04:12.936852] [2023-12-28 18:04:12,936] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:15.062853] [2023-12-28 18:04:15,062] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Saving models and training states. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Finished training!

Well from the stack here, I can see it did resolve the CUDA issue so that is gone now. I'll have to update the package with that. I see all of these normally when I train, including the redirect message so it's not a fatal error.

Can you try to redo the whisper portion, so re-curate the dataset and do it with 1 small audio file?

Ok, so for single voice sample it managed to start training. For sample of 19 files (length from 5 to 30 seconds) it failed - whole process can be seen here https://youtu.be/97tI9K96Ea8 (i paused OBS for each time when need to wait).

Currently I removed everything that was under 20 seconds (3 files left) Results: Failed ... Currently test single file created from all those files connected into single wav (24 bit in da Vinci resolve) Whisper managed to slice file in 30 seconds when loading whisperX take 3.5 minutes Success, training started ---btw I recorded perfect voice database where each file is one full sentence. In theory, those don't need to be split by whisper. There is a way to train with such dataset? I don't mind merge them into single file with large gaps, so whisper can cut only by those empty slots.

JarodMica commented 11 months ago

@jurandfantom and @Alteregohr What GPU's do you guys have? This is probably something I overlooked for different GPUs when making the python runtime. If you wanna try a few modification, this might help me narrow down the issue.

Same as yours - Gigabyte RTX 4090. After following instruction, same stuff as previous: [Training] [2023-12-28T18:02:34.610768] 23-12-28 18:02:34.251 - INFO: Random seed: 8540 [Training] [2023-12-28T18:03:09.007772] 23-12-28 18:03:09.007 - INFO: Number of training data elements: 37, iters: 1 [Training] [2023-12-28T18:03:09.011772] 23-12-28 18:03:09.007 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-12-28T18:03:36.721792] E:\Magazyn\Grafika\AI\Voice2Voice\ai-voice-cloning\runtime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments. [Training] [2023-12-28T18:03:36.725793] warnings.warn( [Training] [2023-12-28T18:03:47.036902] 23-12-28 18:03:47.036 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-12-28T18:04:10.290853] 23-12-28 18:04:10.289 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-12-28T18:04:12.936852] [2023-12-28 18:04:12,936] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:15.062853] [2023-12-28 18:04:15,062] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Saving models and training states. [Training] [2023-12-28T18:04:21.321853] 23-12-28 18:04:21.321 - INFO: Finished training!

Well from the stack here, I can see it did resolve the CUDA issue so that is gone now. I'll have to update the package with that. I see all of these normally when I train, including the redirect message so it's not a fatal error. Can you try to redo the whisper portion, so re-curate the dataset and do it with 1 small audio file?

Ok, so for single voice sample it managed to start training. For sample of 19 files (length from 5 to 30 seconds) it failed - whole process can be seen here https://youtu.be/97tI9K96Ea8 (i paused OBS for each time when need to wait).

Currently I removed everything that was under 20 seconds (3 files left) Results: Failed ... Currently test single file created from all those files connected into single wav (24 bit in da Vinci resolve) Whisper managed to slice file in 30 seconds when loading whisperX take 3.5 minutes Success, training started ---btw I recorded perfect voice database where each file is one full sentence. In theory, those don't need to be split by whisper. There is a way to train with such dataset? I don't mind merge them into single file with large gaps, so whisper can cut only by those empty slots.

Unless you have scripts to format it outside of tortoise, you would need to run it through the GUI using the whisper there.

So it might be an issue with whisperx if whisper works. Would you be able to verify this by trying out whisperx vs whisper a few times? I know I've gotten it working with whisperx, but so recall running into a similar issue which is why I mainly still use whisper.in the GUI.

So the training works, it's just something on the configuration side that's give you troubles.

jurandfantom commented 11 months ago

Just to confirm. I should be able to drop more than one voice sample to voices/NewVoice folder and train from that - correct ?

JarodMica commented 11 months ago
  • I see, in that case I will tinker with file swapping or something :) I expect the whole idea of voice cloning is oriented over dumping not that perfect dataset than create highly curated dataset (to be honest, its easier and most of people clone somebody voice instead own as that just happens once)
  • I noticed at beginning, that whisperX was fine in terms of speed, but recently it become sluggish. Sure, I will compare it few times over different dataset and let you know if something change. tbh. speeds of whisper are perfectly fine anyway so don't see reason of use X just like you.
  • Incorrect. In my case, the issue is, requirement of training data be a single file - not like in your case, a couple samples. In that case, when I would like clone...your voice from your last 10 videos, I would need merge them into single long file and extract voice from that. If I recall correctly, in your case, you had 5 Millena voice samples and things works perfectly fine. I will wait for update as I expect you will post it soon (whisper v3).

Just to confirm. I should be able to drop more than one voice sample to voices/NewVoice folder and train from that - correct ?

Ah gotcha, that is indeed odd because yes, you can use many multiple files to train. It shouldn't require you to slap it into a single file.

I'll be out until January 7th-ish, so I will be away from making releases any time soon, though, I don't think whisper3 would change much in this case as it it just for word error rate (WER) improvements as far as I know

jurandfantom commented 11 months ago

No other changes introduced with new update? I will remove current version and just replaced everything with the new one with hope it will solve issues :D

Wonder why something is wrong. Would be possible to upload sample dataset of Milena (let say 2 or 3 files) along side with folder after post processing with whisper and training profile creation? That would allow me to check if my wav files are somehow wrong and if training works itself with correct data - that would allow to pinpoint issue on my side.

JarodMica commented 11 months ago

No other changes introduced with new update? I will remove current version and just replaced everything with the new one with hope it will solve issues :D

Wonder why something is wrong. Would be possible to upload sample dataset of Milena (let say 2 or 3 files) along side with folder after post processing with whisper and training profile creation? That would allow me to check if my wav files are somehow wrong and if training works itself with correct data - that would allow to pinpoint issue on my side

There have been changes to whisperx, but that's about it 😅

You can try remuxing all of your audio files to WAV files just to make sure there nothing wrong with the files. I won't be able to check things manually for about a week as I'm away from my computer.

Though I think this is another issue and I think the original one has been solved, so if you could open up a new one as I will be closing this one for now.