SD 1.5 Training Completely Broken + Unable to Use Old Kohya Commits

XT-404 commented 1 year ago

Hello,

I am posting a significant issue here:

1: Since the update to SDXL, all my Koyha_SS training for creating a checkpoint or Lora has been completely shattered. Before the major update with the implementation of SDXL, I could easily create Checkpoints or Loras with specific settings, based on extensive research carried out over several months. Now, no matter what values I insert, all my Checkpoints or Loras have a terrible appearance, completely distorted, whether it's after 1000 steps or 3000 steps.

To verify if the problem is related to my PC or if it originates from Kohya, I downloaded Easy Training Lora and performed my training there. Surprisingly, when using Easy Training Lora, I obtain perfect, clean Loras just like before, without faces resembling some kind of bizarre monster. However, when I use the exact same settings in Kohya, I get distorted faces. I asked a colleague who also performs Lora or checkpoint training to conduct a series of tests (over 2 days), and she also obtained distorted faces.

Problem 2: As a result, I decided to revert to an older version of Kohya SS, before the addition and implementation of SDXL. However, during the installation process, I encountered a strange issue. While the installation completed smoothly, upon launching the Kohya interface, the window opens but it's impossible to open any files that would allow us to retrieve the image source, the model checkpoint, or any other interface that lets us open a folder. It's as if the folder selection buttons are disabled.

I am reporting this significant issue, and I want to mention that I am not experiencing any other errors, except for the Triton message, indicating a missing Triton module, but I'm not sure what that means.

Apart from this, the training runs without any anomalies or CUDA/Python errors.

Also, no errors appear or are indicated when installing an older version that doesn't work on the interface.

Thank you in advance for addressing this matter.

Best regards, Augus Wrath

AIrtistry commented 1 year ago

Yes kohya_ss is completely broken for me also since V21.8.2, I used the same settings who gave me good results before 21.8.2 but now impossible to have decent results even with multiples modifications :(

67372a commented 1 year ago

One thing to check would be your version of bits and bytes. From my experience with AdamW8bit on Windows, and even on WSL 2 Ubuntu (but will have to look at more to be sure), newer versions of bits and bytes (required for full bf16) causes it to rapidly scale the weights up, quickly frying the model. This would manifest as high average key weights and a large number of keys being scaled if using scale weight norms, and with loss increasing quickly, eventually to near 1 / NaN, without scaling weight norms.

Any version after 0.35.0 manifests this particular issue for my setups, but it may not be the cause of the original reported issue, just something to check.

67372a commented 1 year ago

I did note that on Windows with the latest version of this UI (21.8.5), at some point for some reason, it looks to be installing a newer version than 0.35.0, possibly mixing multiple versions, grabbing a known working 0.35.0 version (that has the Windows DLLs), then deleting and replacing the existing version, regardless of what it says, works.

This issue occurs even with a completely fresh install of the UI, so there would seem to be some issue in the dependency install process.

Edit: Figured it out, assumed the setup option to install bitsandbytes-windows was just setting it up as normal, but this actually installs a newer version of bits and bytes, and newer versions seem to cause problems with my setup, and seem to typically causes problems for others as well.

JeroenHeyndrickx commented 1 year ago

I have the same problem. @67372a which steps did you follow to install the correct version? pip uninstall bitsandbytes-windows pip install bitsandbytes-windows==0.35.0 doesn't exist.

XT-404 commented 1 year ago

@67372a Regarding the setup and training iterations, here is a simple example of the experiment conducted over 2 days.

20 images were used without any retouching or resizing; the base image format was 512X512. The following parameters were used: BF16 / ADAMW / Constant / Warmup0 / Network 128.

For Batch 1, epoch 10, rep 5, > 1000 steps, everything worked perfectly before the insertion of SDXL. An alternative approach also worked for Batch 2, epoch 10, rep 10, > 1000 steps, and it worked just as well with an IT/s (Iterations per second) ranging from 2.5 to 3, with a loss between 0.5 to 0.7.

However, currently, with the same settings, the IT/s is around 3.5 to 4, with a loss of 0.8 to 0.9. It's important to note that this is with the same checkpoint and all the same settings. For the 20 images with the identical parameters and no modifications, except for the triton error that appears at the end of each epoch (which doesn't seem to cause any issues), the Lora results are completely distorted. Faces look like they have undergone excessive plastic surgery.

To address this issue, I tried running tests with 3000 steps, gradually reducing it to 2500/2000/1500/1000, etc., but the output remains consistently problematic. I also tested using checkpoints that were created when this problem first arose, including special 2D checkpoints, but the Lora results still don't resemble the model at all.

Since older versions of Commit are blocked and not functioning currently, I'm unable to make any functional Lora on SD1.5. The only alternative I found was to use Lora in Vlad or A1111 in combination. To elaborate, I must use this method to obtain approximately 95% of the chosen model's face without excessive distortion, and then apply ADetails on top of it.

, This situation is certainly abnormal and requires attention.

bmaltais commented 1 year ago

OK... so there is probably something bad going on with the bitsandbytes version. This is tricky cause supporting SDXL require newer versions of bitsandbytes for the new optimizers... but it no longer behave like it used to and require totally new LR tuning for training models. I can't really keep the old and the new alive at the same time without some signigicant overhead... So for now best is probably to find the last known to work release and fix the gradio interface so it start to work again.

@XT-404 Can you tell me what was the last good version that no longer work when you open the interface? I will test it on my side and see if I can apply a fix for it and publish a version that will work again?

XT-404 commented 1 year ago

@bmaltais The last version that worked perfectly for me and doesn't work at all now is version 21.7.16. This version was a marvel for me, and currently, all versions from 21.8 to 21.X.X are not functioning at all, even though the installation process completes without any errors. However, I want to emphasize, dear Mr @bmaltais , that version 21.7.16 was flawless in every aspect.

Thank you for your prompt response and attention to this matter. I appreciate your hard work and efforts for us, the users. Thank you, and you have my support.

bmaltais commented 1 year ago

OK, try this and see if it will fix the gradio issue for v21.7.16.

Update the requirements.txt file and change the line

gradio==3.33.1

to

gradio==3.36.1

Save the file and run gui.sh again. Let me know if it fix the GUI issue, If it does I will publish a v21.7.16.1 minor update to fix it so users can use that release again.

XT-404 commented 1 year ago

@bmaltais Download of version: 21.7.16 Unzip: Modification of the file, launch of GUI.sh, Launch of Setup.bat for VENV creation. Package installation: OK Configuration: OK Launching Koyha_ss 21.7.16: OK Attempt to open folders: Successful Everything can be selected now.

Thank you for the tip :)

I will conduct some future tests, and I will come back to you if I notice a return to normalcy or not. It's worth noting that bitsandbytes_windows is also installed in this version, even if not activated.

Thank you again for your response, Mr @bmaltais :)

XT-404 commented 1 year ago

@bmaltais I have just conducted a series of training tests on 20 images using my usual parameters.

Batch 1, epoch 10, rep 5 BF16 / ADAMW / Warmup 0 / Network rank 128

The results are the same, the face is completely distorted.

The issue with opening on Gradio has been resolved, which is understandable. However, the problem with the distorted faces persists, despite having IT/S (Iterations per second) at a maximum of 2.5 and 2.7 and loose (loss) ranging from 0.0230 to a maximum of 0.0570.

XT-404 commented 1 year ago

@bmaltais

I have just performed a test after making modifications on version v21.7.10. I also had to modify Gradio and run GUI.SH two or three times (the first time I encountered errors), but after relaunching it twice, there were no more errors. The installation was successful, done under PyTorch 1.

I launched Kohya version 2.7.10 and started the training. The training process went perfectly, and I can confirm that the results are outstanding. Without a doubt, I now have the faces and appearances of the trained models as before, without any issues.

If bitsandbytes_windows is not referenced in the configuration as in this version, I no longer have any Triton errors and no more visual anomalies.

AIrtistry commented 1 year ago

@bmaltais

I have just performed a test after making modifications on version v21.7.10. I also had to modify Gradio and run GUI.SH two or three times (the first time I encountered errors), but after relaunching it twice, there were no more errors. The installation was successful, done under PyTorch 1.

I launched Kohya version 2.7.10 and started the training. The training process went perfectly, and I can confirm that the results are outstanding. Without a doubt, I now have the faces and appearances of the trained models as before, without any issues.

If bitsandbytes_windows is not referenced in the configuration as in this version, I no longer have any Triton errors and no more visual anomalies.

So its def a problem with bitsandbytes_windows ?

JeroenHeyndrickx commented 1 year ago

@bmaltais First of all, thanks for your great work and effort! Now, Like @XT-404 mentioned v21.7.10 is working great. Here are the steps I used to install it:

Download the 21.7.10 version from https://github.com/bmaltais/kohya_ss/releases?page=2 Change the Gradio version in requirements_windows_torch1.txt to gradio==3.36.1 (see @bmaltais comment above) Run Setup.bat and install it using Torch1 (important!, torch2 v not working)

Important thing is when you have saved config files, recreate them from start!

Thanks again to @bmaltais!

XT-404 commented 1 year ago

@bmaltais Je viens d'effectuer un test après avoir fait des modifications sur la version v21.7.10. J'ai également dû modifier Gradio et exécuter GUI.SH deux ou trois fois (la première fois que j'ai rencontré des erreurs), mais après l'avoir relancé deux fois, il n'y avait plus d'erreurs. L'installation a été réussie, réalisée sous PyTorch 1. J'ai lancé Kohya version 2.7.10 et commencé la formation. Le processus de formation s'est parfaitement déroulé et je peux confirmer que les résultats sont exceptionnels. Sans aucun doute, j'ai maintenant les visages et les apparences des modèles formés comme avant, sans aucun problème. Si bitsandbytes_windows n'est pas référencé dans la configuration comme dans cette version, je n'ai plus d'erreurs Triton et plus d'anomalies visuelles.

Donc c'est définitivement un problème avec bitsandbytes_windows ?

@AIrtistry it would indeed seem

another-ai commented 1 year ago

I have the same problem. @67372a which steps did you follow to install the correct version? pip uninstall bitsandbytes-windows pip install bitsandbytes-windows==0.35.0 doesn't exist.

I had the same problem in the past(#1252), you need to reinstall koyha_ss via setup.bat but NOT install bitsandbytes-windows

XT-404 commented 1 year ago

J'ai le même problème.@67372aquelles étapes avez-vous suivies pour installer la bonne version ? pip uninstall bitsandbytes-windows pip install bitsandbytes-windows==0.35.0 n'existe pas.

J'ai eu le même problème dans le passé, vous devez réinstaller koyha_ss via setup.bat mais PAS installer bitsandbytes-windows

@maxencry

Good morning bits and bytes-windows and automatically inserted in the code of Kohya_ss, even if the activation in the configuration menu is not carried out and automatically functional on versions 2.8 to 2.7.11 for version 2.7.10 to 2.x.x the problem is no longer plausible although bitsandbytes-windows and well inside. so he can't uninstall bitsandbytes-windows vue that Kohya works with.

battousaifader commented 1 year ago

@bmaltais First of all, thanks for your great work and effort! Now, Like @XT-404 mentioned v21.7.10 is working great. Here are the steps I used to install it:

Download the 21.7.10 version from https://github.com/bmaltais/kohya_ss/releases?page=2 Change the Gradio version in requirements_windows_torch1.txt to gradio==3.36.1 (see @bmaltais comment above) Run Setup.bat and install it using Torch1 (important!, torch2 v not working)

Important thing is when you have saved config files, recreate them from start!

Thanks again to @bmaltais!

I got it working like you described, but made a 2nd folder for install with Torch v2, and it seems to be working for me. What was wrong with it?

Thom293 commented 1 year ago

taggins

DarkAlchy commented 1 year ago

I wonder if any of this may actually have its tentacles in the slow speed issue as well?

XT-404 commented 1 year ago

@DarkAlchy no idea at the moment.

what I can say, however, is that all of the old versions, for example 21.1b or 21.1.0, no longer work at all, the setup and KO. and it's annoying not to be able to use it anymore

DarkAlchy commented 1 year ago

@DarkAlchy no idea at the moment.

what I can say, however, is that all of the old versions, for example 21.1b or 21.1.0, no longer work at all, the setup and KO. and it's annoying not to be able to use it anymore

I did a nice test about speed and I used the old scripts directly for 2.1. The speed greatly increased. https://github.com/bmaltais/kohya_ss/issues/961#issuecomment-1674843353

With the introduction of the code for SDXL it screwed up the speed for all versions (don't know how nor care to know how just know what I found in my test). Something is seriously wrong with the scripts. This is just a gui bmalt made it is a friendlier interface for people to use the sd-scripts. It is nothing more so the problem lies in the sd-scripts made by Kohya (again, as I proved).

XT-404 commented 1 year ago

@DarkAlchy I can't say where exactly the problem comes from, personally I've tried all of them, and absolutely nothing works, even easy training lora, gives exactly the same results as on kohya_ss so I think yes the SD engine must have taking a hell of a slap + the bytsandbyte anomaly didn't help.

DarkAlchy commented 1 year ago

@XT-404 Thing is did you see my experiment? I think the slowness in ADA based cards is deliberate by Nvidia to protect their Hopper sales. Think of it as an LHR but for this. Gamers will never see it, just as they didn't with those LHR cards. LHR was mostly defeated so I bet if we had the ones (if they had the incentive because they sure have the brains to do it) who broke LHR look at the current drivers I bet they would find something. Might just be an innocent issue but the 4090 will be 1 year old come October and this isn't AMD so no plausible excuse short of malice, imo.

The testing I did showed Kohya has an issue too, where I went from sub T4 speeds on Colab to about twice as fast. Combine the driver nonsense with the Kohya nonsense (SDXL nonsense) we have some serious issues that need fixing. Remember, the Linux driver is 535 as the latest proprietary one I can get.

ecnat commented 1 year ago

It also felt like whatever I did still ended up frying the lora or network (when I attempted dreambooth), even when starting from the sd15 branch. Reinstalling venv and even clearing out .cache/huggingface in my home directory did not help.

In case others like me end up reading this issue, and reach their wits end, this is what finally worked for me.

Basic idea was to switch to upstream kohya-ss/sd-scripts running in WSL2 and use this project to help generate the command line.

Also had to track down and hack around a few things to get xformers and bitsandbytes working...

Update WSL2 to enable vGPU support.
Install Nvidia Cuda Toolkit 11.6 for WSL. (Used sample apps to check that everything is working.)
Clone kohya-ss/sd-scripts, main branch (was 59c9a8e7aec when I cloned).
Compile Python 3.10.9 from source code. Create a venv in sd-scripts.
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt
pip install --upgrade nvidia-tensorrt --extra-index-url https://pypi.nvidia.com
accelerate config
Clone facebookresearch/xformers, check out commit 0bad001ddd
In the xformers repo, pip install -e . (This will take a while...)
Set the absolute path to venv/lib/python3.10/site-packates/nvidia/cublas/lib in LD_LIBRARY_PATH before calling accelerate with train_network.py
Use this GUI to generate the command line, then run it with sd-scripts in WSL.

YukiSakuma commented 1 year ago

@bmaltais First of all, thanks for your great work and effort! Now, Like @XT-404 mentioned v21.7.10 is working great. Here are the steps I used to install it:

Download the 21.7.10 version from https://github.com/bmaltais/kohya_ss/releases?page=2 Change the Gradio version in requirements_windows_torch1.txt to gradio==3.36.1 (see @bmaltais comment above) Run Setup.bat and install it using Torch1 (important!, torch2 v not working)

Important thing is when you have saved config files, recreate them from start!

Thanks again to @bmaltais!

I followed these steps and I noticed a significant speedup than using the latest version I went from 90s/it to ~2.3s/it when training a SD 1.5 lora, but I am not sure this is the speed I should be getting, I forgot what speed I was getting from earlier versions. My setup is a RTX 3090, 768x768 resolution, 6 batch size, 5 repeats and 10 epochs for a total of 1184 total steps and it completes in ~40 mins. Is this the expected speed for a 3090 with these parameters? { "LoRA_type": "Kohya DyLoRA", "adaptive_noise_scale": 0, "additional_parameters": "", "block_alphas": "", "block_dims": "", "block_lr_zero_threshold": "", "bucket_no_upscale": true, "bucket_reso_steps": 64, "cache_latents": true, "cache_latents_to_disk": false, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "caption_extension": "", "clip_skip": 2, "color_aug": false, "conv_alpha": 1, "conv_alphas": "", "conv_dim": 8, "conv_dims": "", "decompose_both": false, "dim_from_weights": false, "down_lr_weight": "", "enable_bucket": true, "epoch": 10, "factor": -1, "flip_aug": true, "full_fp16": false, "gradient_accumulation_steps": "1", "gradient_checkpointing": false, "keep_tokens": "0", "learning_rate": 0.0001, "logging_dir": "", "lora_network_weights": "", "lr_scheduler": "cosine", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "lr_warmup": "10", "max_data_loader_n_workers": "0", "max_resolution": "768,768", "max_token_length": "225", "max_train_epochs": "", "mem_eff_attn": false, "mid_lr_weight": "", "min_snr_gamma": 0, "mixed_precision": "bf16", "model_list": "custom", "module_dropout": 0, "multires_noise_discount": 0, "multires_noise_iterations": 0, "network_alpha": 1, "network_dim": 8, "network_dropout": 0, "no_token_padding": false, "noise_offset": 0, "noise_offset_type": "Original", "num_cpu_threads_per_process": 2, "optimizer": "AdamW8bit", "optimizer_args": "",

DarkAlchy commented 1 year ago

@YukiSakuma That is absolutely horrid for a 1.5 training. RTX4090 1024x1024 BS8 20 repeats with 20 images and regularization I get, about the same as you are in Linux. In Windows 10 I get about 1s/it longer. edit: I see you are using DyLoRA. Ditch it as it is rubbish, slow, and doesn't work in ComfyUI which most people are moving to (my iA3 is never going to be implemented in ComfyUI either he told me).

WilliamKappler commented 1 year ago

I think I'm suffering from the same issue, except that Easy Training Lora gives approximately the same results as Kohya. Settings/imagesets that used to give high-quality, stable LORAs now give LORAs that completely mangle outputs. Extra appendages, bad eyes, weird outlines around characters, a film grain over images that won't go away, and a poorer understanding of the concept I am trying to train it on.

I have tired Torch 1 and 2 with nearly identical results, in both ETL and Kohya. Based on my experience and what other people have reported, it sounds like it's probably with some dependency rather than Kohya itself.

I am going to investigate further and possibly try what ecnat mentioned. I will update this comment or leave a new one if I find anything useful.

Update: I had no luck. I didn't go through the effort of WSL, but I did try 2 older releases, including one from around the time I think I downloaded the version that worked. I also tried intentionally diverging from my old config in various ways and it doesn't change much. Maybe it's Nvidia's fault somehow. For the time being, I am going to wait it out and see if more knowledgeable people find anything because I'm out of my depth on this topic.

ecnat commented 1 year ago

I should mention that while training under WSL gave me better results, it still wasn't very satisfactory. I've not trained LoRAs before this, so I don't know if it's just my settings or training set, though.

In the end, what gave me good results was full Dreambooth on a runpod with a bigger GPU...

WilliamKappler commented 1 year ago

I'm happy, but I'm afraid this might not be too useful to the discussion.

I was able to get training working the same as it was before. Unfortunately, I had to use a Frankenstein approach of recreating my first setup from bits and pieces left over. I'm not sure what part of this did it, but if I had to guess, it's the old kohya_ss hash.

Here's what I did:

Used an old CUDA 1.17 install I forgot I had (set it as my system env CUDA_PATH), probably from here
Checked out kohya_ss commit hash f4a9d489a94ce39f8599c6b7137c201b5e27ee12 (extracted from the metadata of my "good" LORA)
Ran setup.bat with all defaults/1s except fp16 (which I believe I used the first time), so Torch 1
Activated the venv and upgraded gradio to 3.36.1, letting it update what it had to in the process (huggingface hub and something else): pip install gradio==3.36.1
Commented out "python.exe .\tools\validate_requirements.py" in gui.bat to avoid it complaining that things didn't match anymore
Trained with the same parameters and image set as the first time

It didn't generate the exact same LORA (can't reproduce images the old one made), but the outputs are equivalent in quality.

At risk of derailing this somewhat, can anyone explain what sshs_legacy_hash and sshs_model_hash mean? These, the timestamps, and session ID are the only metadata difference between my new and old LORA. I'm curious if I can recreate the EXACT LORA I made before if I make those match.

bmaltais / kohya_ss

SD 1.5 Training Completely Broken + Unable to Use Old Kohya Commits #1291