Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.84k stars 302 forks source link

Google Colab LORA Dreambooth Dependency error #228

Closed Vynavo closed 1 year ago

Vynavo commented 1 year ago

Yesterday everything worked fine, today I get this error.

I tried training afterwards, but the LORAs quality is problematic afterwards. So the errors are affecting the training.

I tried to fix the dependency conflicts, by installing the right versions, but while this worked for "requests", torch 2.0.0 is necessary for xformers, so installing torch 2.0.1 instead, created a new error.

Since I don't know how to fix this, and my coding knowledge ends at this point. I could really use some help.

Building wheel for fairscale (pyproject.toml) ... done Building wheel for dadaptation (pyproject.toml) ... done Building wheel for lycoris-lora (setup.py) ... done Building wheel for library (setup.py) ... done Building wheel for elfinder-client (setup.py) ... done Building wheel for pathtools (setup.py) ... done ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.28.2 which is incompatible. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.8/123.8 MB 7.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 2.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 73.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 kB 51.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 110.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 2.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 4.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.4/168.4 MB 6.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 MB 13.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.6/102.6 MB 8.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.2/173.2 MB 7.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.1/177.1 MB 6.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 kB 13.9 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible. torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible. torchtext 0.15.2 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible. torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.

TNitroo commented 1 year ago

Okey, then I'm not the only one facing this issue now. I could train like 3 hours ago, but now I can't because of the pytorch version not matching 11.8 of CUDA. Though I noticed, that the last training I did was horrible for some reason, even though I used that exact same settings and all I've been using all the time with good results.

Ryuzhal commented 1 year ago

Yesterday everything worked fine, today I get this error.

I tried training afterwards, but the LORAs quality is problematic afterwards. So the errors are affecting the training.

I tried to fix the dependency conflicts, by installing the right versions, but while this worked for "requests", torch 2.0.0 is necessary for xformers, so installing torch 2.0.1 instead, created a new error.

Since I don't know how to fix this, and my coding knowledge ends at this point. I could really use some help.

Building wheel for fairscale (pyproject.toml) ... done Building wheel for dadaptation (pyproject.toml) ... done Building wheel for lycoris-lora (setup.py) ... done Building wheel for library (setup.py) ... done Building wheel for elfinder-client (setup.py) ... done Building wheel for pathtools (setup.py) ... done ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.28.2 which is incompatible. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.8/123.8 MB 7.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 2.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 73.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 kB 51.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 110.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 2.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 4.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.4/168.4 MB 6.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 MB 13.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.6/102.6 MB 8.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.2/173.2 MB 7.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.1/177.1 MB 6.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 kB 13.9 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible. torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible. torchtext 0.15.2 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible. torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.0.0 which is incompatible.

so how? i have same problem

Vynavo commented 1 year ago

I can confirm, that just trying to install the correct versions of "requests" and "torch" does not resolve the problem. Since xformers needs to be deactivated for the newer "torch" version to not conflict with xformers. ; Installing the correct version of "requests" seems to give no further problems, but it is noted that "requests 2.28.2" is needed for BLIP captioning; since I don't use it in colab, I can't confirm that there are no errors with that afterwards. ;

Also I'm not really sure why google colab 1.0.0 needs an older version of "requests", even though "requests 2.28.2" seems to have been used for some time now.

Generally trying to get around those problems by not using xformers, and installing the needed versions, gives CUDA out of memory errors in training. So if anyone would want to try this, they would have to adjust all the settings, to get memory efficiency out of the collab, or use premium GPUs to have enough VRAM. (I didn't test this on any of the high VRAM GPUs, so there is a chance that going with raw power, can make this work again, until an actual fix is found. But from my experience, disabling xformers drastically changes the quality of your LoRAs, sometimes for better, sometimes for worse.)

Ryuzhal commented 1 year ago

I can confirm, that just trying to install the correct versions of "requests" and "torch" does not resolve the problem. Since xformers needs to be deactivated for the newer "torch" version to not conflict with xformers. ; Installing the correct version of "requests" seems to give no further problems, but it is noted that "requests 2.28.2" is needed for BLIP captioning; since I don't use it in colab, I can't confirm that there are no errors with that afterwards. ;

Also I'm not really sure why google colab 1.0.0 needs an older version of "requests", even though "requests 2.28.2" seems to have been used for some time now.

Generally trying to get around those problems by not using xformers, and installing the needed versions, gives CUDA out of memory errors in training. So if anyone would want to try this, they would have to adjust all the settings, to get memory efficiency out of the collab, or use premium GPUs to have enough VRAM. (I didn't test this on any of the high VRAM GPUs, so there is a chance that going with raw power, can make this work again, until an actual fix is found. But from my experience, disabling xformers drastically changes the quality of your LoRAs, sometimes for better, sometimes for worse.)

so we have to wait for @Linaqruf to fix it ?

Ryuzhal commented 1 year ago

Screenshot 2023-05-19 012208 still facing this

Vynavo commented 1 year ago

I tried my fix on a premium GPU in google colab and the quality of the LoRA is without question not good. (This should come from not using xformers though). So until @Linaqruf can resolve this issue, or until this issue resolves itself. There doesn't seem to be anything that can be done.

There is a chance that this problem can be fixed by rolling back all the torch- versions to a point where they don't need the newer "torch" version. Upgrading xformers doesn't work, the newest version seems to be 0.0.19 and it's not compatible with torch==2.0.1

That's all I can do for now. I don't know how to roll back any of the torch- versions. And it seems to be a problem that needs more than a temporary fix. Since xformers needs to be compatible with the fix.

Ryuzhal commented 1 year ago

wait for @Linaqruf to fix it

Ryuzhal commented 1 year ago

please fix it @Linaqruf :((((

meimeifung commented 1 year ago

I don't know what I am doing, but I am just trying to get it running.

I changed the requirements.txt requests==2.27.1 and add the following line torch==2.0 I also comment out this line in step 1.1 remove_bitsandbytes_message(bitsandytes_main_py)

(you need to run 1.1 (failed) then 1.2 to mount the drive to update the file and then run 1.1 again).

It is running now and will see how good in train... It may be crap... will see

EDIT: the training result seems ok to me.

Vynavo commented 1 year ago

It's a similar approach to mine, but you went for the old torch which conflicts with the different torch- versions. So by commenting the "bitsandbytes" part, you practically removed the printing of errors?

You could always run the training even with those errors. But it seems like they affect the quality, by either messing with xformers, or something else;

Changing requests , does remove an error. But installing torch==2.0 shouldn't do anything, since it's already the version that gets installed. ;

Before I try anything else, did you use a Premium GPU for your training, did you adjust any settings to get a less ressource heavy execution of the training?

meimeifung commented 1 year ago

It's a similar approach to mine, but you went for the old torch which conflicts with the different torch- versions. So by commenting the "bitsandbytes" part, you practically removed the printing of errors?

You could always run the training even with those errors. But it seems like they affect the quality, by either messing with xformers, or something else;

Changing requests , does remove an error. But installing torch==2.0 shouldn't do anything, since it's already the version that gets installed. ;

Before I try anything else, did you use a Premium GPU for your training, did you adjust any settings to get a less ressource heavy execution of the training?

First of all, I don't know what I am doing. I use GPU (free tier of colab) My understanding of python (I dont know python) is that if you update the requirements.txt, it download again even it was installed. I can be wrong.

Yes, you are correct that I am just get rid of error messages I see. I am using usual setting I think, and I double check the config I have xformers enabled. xformers = true Nothing special (I usually just do 1.1, 1.2, 2.1, 3.1, 5.1,5.2,5.3,5.4,5.5 that I only change the project name and folder name)

Ryuzhal commented 1 year ago

still error?

ghost commented 1 year ago

Getting this with a different trainer as well. I tried to find an alternative till this is fixed. Unfortunate.

deszeroNEXT commented 1 year ago

pip install torch==2.0.0+cu118 --index-url https://download.pytorch.org/whl/cu118

Added and ran this just before the training. Now I can train as before. Xformer works as well. Just not sure if there is any hidden issue as it's telling me that I must restart the runtime??

See below: Looking in indexes: https://download.pytorch.org/whl/cu118, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting torch==2.0.0+cu118 Downloading https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl (2267.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 2.3/2.3 GB 115.4 MB/s eta 0:00:01tcmalloc: large alloc 2267275264 bytes == 0x365e000 @ 0x7f7e6b21e1e7 0x5f2671 0x62913a 0x68559c 0x565d58 0x565def 0x53f76e 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5aa8c5 0x629910 0x5a9c15 0x629910 0x5a9c15 0x5a87e1 0x548a9a 0x539689 0x6a4fa6 0x5ae1f3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 2.3/2.3 GB 123.0 MB/s eta 0:00:01tcmalloc: large alloc 2834096128 bytes == 0x8a8a0000 @ 0x7f7e6b21f615 0x628ff9 0x68559c 0x565d58 0x565def 0x53f76e 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5aa8c5 0x629910 0x5a9c15 0x629910 0x5a9c15 0x5a87e1 0x548a9a 0x539689 0x6a4fa6 0x5ae1f3 0x54863a tcmalloc: large alloc 2267275264 bytes == 0x365e000 @ 0x7f7e6b21e1e7 0x62ee1b 0x62dbe0 0x53f76e 0x5a9c15 0x629910 0x5a9c15 0x629910 0x5a9c15 0x5a87e1 0x548a9a 0x539689 0x6a4fa6 0x5ae1f3 0x54863a 0x5ae1f3 0x54863a 0x5ae1f3 0x629910 0x5a9c15 0x54863a 0x5aa8c5 0x53e65a 0x5a9efb 0x53e65a 0x5a9efb 0x53e65a 0x5a9efb 0x629910 0x62c449 0x62c57a ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 GB 602.3 kB/s eta 0:00:00 Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0+cu118) (3.12.0) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0+cu118) (4.5.0) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0+cu118) (1.11.1) Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0+cu118) (3.1) Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0+cu118) (3.1.2) Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0+cu118) (2.0.0) Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch==2.0.0+cu118) (3.25.2) Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch==2.0.0+cu118) (16.0.5) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch==2.0.0+cu118) (2.1.2) Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch==2.0.0+cu118) (1.3.0) Installing collected packages: torch Attempting uninstall: torch Found existing installation: torch 2.0.0 Uninstalling torch-2.0.0: Successfully uninstalled torch-2.0.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.0.0+cu118 which is incompatible. torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.0.0+cu118 which is incompatible. torchtext 0.15.2 requires torch==2.0.1, but you have torch 2.0.0+cu118 which is incompatible. torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.0.0+cu118 which is incompatible. Successfully installed torch-2.0.0+cu118 WARNING: The following packages were previously imported in this runtime: [nvfuser,torch] You must restart the runtime in order to use newly installed versions.

Vynavo commented 1 year ago

Okay I did some testing and I'm not a 100% sure , it does seem like it's overtraining. But there is also a chance, that it's just the data set. But with this fix, you can test for yourself if it works.

In 1.1 click to see the code

go to "def install_dependencies" and change the name of "requirements.txt" to "requirementsfix.txt" Run the code, until it tell you that there is no "requirementsfix.txt" then click to stop. Take my "requirementsfix.txt" requirementsfix.txt and put it into the kohya-trainer folder, then run the code again.

It will still show you an error at the end. But after testing. It seems like those conflicts, don't affect the training. Everything else should work normally after that.

Since I'm not sure if it's overtraining at the moment or not, please give me some feedback, about your training results.

hummingcc commented 1 year ago

[edit] Forget it, Linaqruf solved it. Thanks a lot <3

I've changed the torch and torchvision ver to cuda 11.7 on my colab. This operation increase the amount of time to finish the step (now is 4min). For me no error appeared after inserting these lines. Just insert these lines of code in this position on 1.1 step: !pip install https://download.pytorch.org/whl/cu117/torch-2.0.0%2Bcu117-cp310-cp310-linux_x86_64.whl !pip install https://download.pytorch.org/whl/cu117/torchvision-0.15.0%2Bcu117-cp310-cp310-linux_x86_64.whl

image

The training is going on normally and the results so far are as expected. When it is finished I can give you feedback.

Linaqruf commented 1 year ago

Thank you for the report.

Colab recently updated the Torch version to 2.0.1, while the latest xformers versions still use Torch 2.0.0, resulting in a dependency conflict.

image

However, latest pre-release xformers wheel (https://pypi.org/project/xformers/0.0.20.dev539/) are installing torch 1.12.1 instead of 2.0.1, weird. So, until xformers that support Torch 2.0.1 are released, we will downgrade Torch to 2.0.0. The downside is that the setup time will increase slightly because the Torch 2.0.0 wheel is quite large.

If any friends are using an old notebook commit, I suggest running this code:

!pip instal-q torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 torchtext==0.15.1 torchdata==0.6.0 --extra-index-url https://download.pytorch.org/whl/cu118 -U
!pip install xformers==0.0.19 triton==2.0.0 -U

It also appears that the requests package had a dependency conflict, so I downgraded it to version 2.27.1. Requests are used for BLIP install. Please let me know if it raises any errors.

Thank you.