bmaltais / kohya_ss

Apache License 2.0
9.35k stars 1.21k forks source link

Appeal for the Separation of SD 1.5 from SDXL #1401

Closed XT-404 closed 7 months ago

XT-404 commented 1 year ago

Hello, @bmaltais

Considering the critical situation of SD 1.5 content creators, which has been severely impacted since the SDXL update, shattering any feasible Lora or CP designs,

We are requesting that SD 1.5 be separated from SDXL in order to continue designing and creating our CPs or Loras.

Many of us, including myself, have invested significant amounts of money to passionately create quality Checkpoints and Loras.

I now find myself completely handicapped and unable to design even a functional and worthy Lora or Checkpoint.

Numerous individuals in the community, whether in France or the United States, are suffering due to the forced installation of SDXL, which has destroyed our ability to design and enjoy our creations.

The fact that we cannot roll back since all the commits are obsolete further exacerbates the situation.

I appeal to intelligence, logic, and reason to rescue SD 1.5 from this SDXL nightmare, in the interest of the community that supports SD 1.5 and has no interest in SDXL.

I understand that it will require effort, undoubtedly, but please realize that people like me, who have invested over €6,000 in equipment for significant projects, are now stuck and technically unemployed due to this SDXL implementation.

Thank you for considering my request. I also urge the entire community to support this message so that SD 1.5 can be revived and no longer remain in its current state.

Also, thank you for the effort and work invested, but please, separate SD 1.5 from SDXL, for the sake of all those who support you, believe in you, and hope for a repaired and functional SD 1.5 to return.

Thank you in advance.

Best regards,

wendythethird commented 1 year ago

sd15storyeng A little more effort and Khtulu will rule the world!

XT-404 commented 1 year ago

image

bmaltais commented 1 year ago

Let's agree on the last good commit and I will create a SD1.5 branch from it. Then we can work out why it will not run properly as in theory it should. It might come down to some gradio version for the UI.

That release will not see any further development but will allow to keep what used to work in SD1.5 functional.

Obviously if people contribute PR for the 1.5 branch I will merge them if it make sense...

XT-404 commented 1 year ago

@bmaltais

I agree with you on this point.

The problem is to determine which version was functioning correctly. Some say version 21.8, others 21.7. I would have liked to conduct the tests myself. However, when I want to do it, it's impossible for me since the setup is down, even if I modify the requirements or the .sh file you indicated to me. Either we go for a version 21.7 close to 21.8, and we take the bull by the horns to make this version work and remain in an isolated corner without updates. Which, I believe, would be ideal for you, well, I think.

I will give you feedback tomorrow. My colleague and I will verify the logs and the journals where everything went perfectly fine. We will confirm with you if the version that we believe is indeed the version:

https://github.com/bmaltais/kohya_ss/releases/tag/v21.7.6

I will come back tomorrow to inform you about this without fail.

Thank you for your response.

XT-404 commented 1 year ago

Hello @bmaltais ,

I'm reaching out to you again, as indicated, to provide you with the version that has given me slight results so far.

I've tested the following versions: from version 21.6.5 to version v21.7.8. The only one that yielded a result of about 60% is this version. https://github.com/bmaltais/kohya_ss/releases/tag/v21.7.8

However, there are clearly numerous illogical anomalies. I conducted the following tests:

20 identical images. Settings with batch 1, epochs 10, repetitions 5 > total of 1000 steps: this configuration has always given me excellent results for 20 images. However, this time, they're completely distorted, Ktullu-style. On the other hand, if I use batch 2, epochs 10, repetitions 10 > total of 1000 steps: still identical, I achieve a satisfactory result of 60%. This is entirely illogical, considering that I'm not changing either the checkpoint or the image. I performed all tests in the same manner across all versions. Currently, the only one standing out is v21.7.8. Now, the task is to understand why this version isn't working as before, why batch 1 and batch 2, configured with the same steps, yield completely different results, and especially why all the images come out in a "plastic deformed monster" mode, except under batch 2 in the v21.7.8 version.

Thank you for your feedback. I'm available for any tests or assistance that I can provide.

Best regards.

bmaltais commented 1 year ago

Well, if the version work to generate the models I can't really do much about the results. If you find a version that produce the results you used to have I can create a branch from it... but personally aI have never had issues with the models produced from any of the versions... so it is hard to troubleshoot. :-(

But for now I can create the sd2.5 branch from the 21.7.8 release so it can be used as a fundation.

XT-404 commented 1 year ago

@bmaltais how its a SD 2.5 branch?

bmaltais commented 1 year ago

Typo. I have published the code in the sd15 branch. I also updated the gradio release so it does not cause issue with the browsers.

XT-404 commented 1 year ago

Well, if the version work to generate the models I can't really do much about the results. If you find a version that produce the results you used to have I can create a branch from it... but personally aI have never had issues with the models produced from any of the versions... so it is hard to troubleshoot. :-(

But for now I can create the sd2.5 branch from the 21.7.8 release so it can be used as a fundation.

@bmaltais The version works to carry out training, certainly without any error messages or major anomalies to report. However, it only functions at 60%. The model is not stable, and the batch system is completely disorganized. We can apply all possible parameters to try to rectify the situation, whether it's in the configuration, the model used, the photos, but the anomaly remains consistent: a plastic-like effect, deformations, etc.

bmaltais commented 1 year ago

If if find the release that work let me know. The code in the release is locked and should produce consistent results. Drivers update on the other hand have been known to cause training variations. It might be possible that new drivers are now used vs the ones that were a few months ago?

XT-404 commented 1 year ago

@bmaltais To be completely honest, I have no idea. All I know is that for this version 21.7.8, when I used it, I would get magnificent results with Batch 1, epoch 10, reap 5, or with Batch 2, epoch 10, reap 10. Now, I have monsters. I pushed the training with reap 20 and then 30, the result is either the same or completely burnt out. I'm not working alone to conduct the tests, we are two, and despite our testing, nothing is good. The only people who are getting good results are those who have never done the Kohya_ss update and who are still managing to achieve beautiful Lora or CP.

XT-404 commented 1 year ago

@bmaltais

Version v21.5.11

can you please tell me how to modify this version to carry out tests on it please? there are 2 different gratio, thank you: version

ftfy==6.1.1 gradio==3.28.1; sys_platform != 'darwin' gradio==3.23.0; sys_platform == 'darwin' lion-pytorch==0.0.6 opencv-python==4.7.0.68 pytorch-lightning==1.9.0

XT-404 commented 1 year ago

I modified the versions so that it is installed which and ok, now I'm running tests to check I'm also going to go back to the basic graphics driver, I have a 4090 I'm going to see if I can install the studio driver

bitsandbytes==0.35.0; sys_platform == 'win32' bitsandbytes==0.38.1; (sys_platform == "darwin" or sys_platform == "linux") dadaptation==1.5 diffusers[torch]==0.10.2 easygui==0.98.3 einops==0.6.0 ftfy==6.1.1 gradio==3.36.1; sys_platform != 'darwin' gradio==3.23.0; sys_platform == 'darwin' lion-pytorch==0.0.6 opencv-python==4.7.0.68 pytorch-lightning==1.9.0 safetensors==0.2.6 tensorboard==2.10.1 ; sys_platform != 'darwin' tensorboard==2.12.1 ; sys_platform == 'darwin' tk==0.1.0 toml==0.10.2 transformers==4.26.0 voluptuous==0.13.1 wandb==0.15.0

for BLIP captioning

fairscale==0.4.13 requests==2.28.2 timm==0.6.12

tensorflow<2.11

huggingface-hub>=0.14.0; sys_platform != 'darwin' huggingface-hub==0.13.0; sys_platform == 'darwin' tensorflow==2.10.1; sys_platform != 'darwin'

For locon support

lycoris_lora==0.1.4

for kohya_ss library

image

bmaltais commented 1 year ago

@XT-404 Let me know how this config goes.

XT-404 commented 1 year ago

@bmaltais I am currently being tested on almost all old and current versions.

I must be at around 150 to 200 tests carried out since yesterday

currently I have 4 versions that stand out

image

I try to obtain a 75 / 80% image fidelity with different parameters

let it be 20 frames / 30 / 40 / 50 / 100 / 1000 and the same for rehearsals, etc.

I do my best to see the best targeted among the 4 remaining

XT-404 commented 1 year ago

@bmaltais

After 2 whole days of hard testing of configuration ect , I finally found a version that reaches 80% with parameters not too hardcore worked, however Cudnn should not be installed as bytsandbit, , without it's two there the program works. The functional version has 80% success rate currently and version 21.5.11 The changes in Pytorch file 1 are as follows for it to work.

accelerate==0.18.0 albumentations==1.3.0 altair==4.2.2

remove the # https://github.com/bmaltais/bitsandbytes-windows-webui/raw/main/bitsandbytes-0.38.1-py3-none-any.whl; sys_platform == 'win32' *# This next line is not an error but rather there to properly catch if the url based bitsandbytes was properly installed by the line above...

bitsandbytes==0.35.0; sys_platform == 'win32' bitsandbytes==0.38.1; (sys_platform == "darwin" or sys_platform == "linux") dadaptation==1.5 diffusers[torch]==0.10.2 easygui==0.98.3 einops==0.6.0 ftfy==6.1.1 gradio==3.36.1; sys_platform != 'darwin' gradio==3.23.0; sys_platform == 'darwin' lion-pytorch==0.0.6 opencv-python==4.7.0.68 pytorch-lightning==1.9.0 safetensors==0.2.6 tensorboard==2.10.1 ; sys_platform != 'darwin' tensorboard==2.12.1 ; sys_platform == 'darwin' tk==0.1.0 toml==0.10.2 transformers==4.26.0 voluptuous==0.13.1 wandb==0.15.0 # for BLIP captioning fairscale==0.4.13 requests==2.28.2 timm==0.6.12 # tensorflow<2.11 huggingface-hub>=0.14.0; sys_platform != 'darwin' huggingface-hub==0.13.0; sys_platform == 'darwin' tensorflow==2.10.1; sys_platform != 'darwin' # For locon support lycoris_lora==0.1.4 # for kohya_ss library .

This time I can confirm that this version is viable at 80%

let it be under 20 frames, 50, 100, 1000.

all that remains is to work on the code to obtain a success rate of 95/100% and all will be perfect;)

PS: the graphics drivers had no impact on training, I used the old driver versions and the new ones, it did not change the training percentage.

XT-404 commented 1 year ago

Now the community has hot talents or Python developers to contribute to the building to perfect our dear friend Kohya_ss 21.5.11 to be perfect on this training ^^. If I had the knowledge, I would have gladly supported and helped, but I do not have the level of @bmaltais

Loadus commented 1 year ago

Adding a sidenote here that I also experienced complete breakage of anything 1.5 training when SDXL stuff was added but I managed to solve it by re-installing CUDA 11.8 (also noting that current display driver is 531.61). Non-functioning or wobbly LoRAs were a problem for weeks but this was the thing that 'repaired' the training. Took several hours to debug, that for some (whatever) reason, xformers was borking the entire training - if I trained without it, everything was more or less correct. I traced it back to CUDA not 'connecting' to the training session at all (if that is even a good way to describe it).

After re-installing CUDA 11.8, training speed increased tremendously (going from ~4.5s/it ---> ~2.13s/it), so that was a further indication that something was borked badly.

Not sure if it will help anyone else, just thought I'd mention this.

07:23:01-369448 INFO Version: v21.8.7

07:23:01-390391 INFO nVidia toolkit detected 07:23:08-767791 INFO Torch 2.0.1+cu118 07:23:08-819921 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 07:23:08-827899 INFO Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12287 Arch (8, 6) Cores 28

bmaltais commented 1 year ago

Thank you @Loadus for sharing your experience... SO insummary folks with issues should re-install CUDA 11.8 and make sure thay use NVidia drivers 531.61.

XT-404 commented 1 year ago

@bmaltais I will apply and test the information provided by @Loadus I will post the analysis return and result here.

bmaltais commented 1 year ago

Direct link to 531.61 drivers:

Game Ready Driver Download Link: https://us.download.nvidia.com/Windows/531.61/531.61-desktop-win10-win11-64bit-international-dch-whql.exe

Studio Driver Download Link: https://us.download.nvidia.com/Windows/531.61/531.61-notebook-win10-win11-64bit-international-nsd-dch-whql.exe

XT-404 commented 1 year ago

@bmaltais After testing on different subjects: realistic, manga, comics, anime, style. The training method has changed and no longer works on 1000 steps but on 3000 steps. On the other hand, strangely enough, everything works except the realistic one. If I input realistic training, I get images that are extremely difficult for the indicated model to achieve. If I input anime, manga, comics training, the rendering is perfect. The anomaly with the realistic is present in all CP tests, whether it's on CivicAI's CP dedicated to realism or on personal CPs designed for that purpose.

I of course use the indicated driver CUDA 11.8 and NVidia 531.61 drivers.

Sniper199999 commented 1 year ago

@bmaltais After testing on different subjects: realistic, manga, comics, anime, style. The training method has changed and no longer works on 1000 steps but on 3000 steps. On the other hand, strangely enough, everything works except the realistic one. If I input realistic training, I get images that are extremely difficult for the indicated model to achieve. If I input anime, manga, comics training, the rendering is perfect. The anomaly with the realistic is present in all CP tests, whether it's on CivicAI's CP dedicated to realism or on personal CPs designed for that purpose.

I of course use the indicated driver CUDA 11.8 and NVidia 531.61 drivers.

great findings by @XT-404 and @Loadus. Have you figured out why the steps have been increased to 3000 from 1000 and why the realistic images are hard to achieve?

XT-404 commented 1 year ago

@bmaltais After testing on different subjects: realistic, manga, comics, anime, style. The training method has changed and no longer works on 1000 steps but on 3000 steps. On the other hand, strangely enough, everything works except the realistic one. If I input realistic training, I get images that are extremely difficult for the indicated model to achieve. If I input anime, manga, comics training, the rendering is perfect. The anomaly with the realistic is present in all CP tests, whether it's on CivicAI's CP dedicated to realism or on personal CPs designed for that purpose. I of course use the indicated driver CUDA 11.8 and NVidia 531.61 drivers.

great findings by @XT-404 and @Loadus. Have you figured out why the steps have been increased to 3000 from 1000 and why the realistic images are hard to achieve?

@Sniper199999

"After several weeks of intensive testing,

Training on realism does not work at all. For a reason I can't understand, training on manga/comics/BD/drawing/3D/2D works perfectly in LORA.

Realism, on the other hand, is completely shattered. Why are training steps above 1000 and jump to 3000? No idea. I tried to get closer to the most functional with 20 images, and only 3000 steps work. If I'm below that, I get under-training and if I go above, it burns the training (I'm under 4090). I don't have a slowness problem and the training remains at 2.5 or 2 without significant loss.

However, regarding the design of Checkpoints, it's not even worth mentioning: nothing works. I can put any type, all the CPs made from 3000 steps to 10K steps and others come out in confetti mode or completely blown up.

The only thing currently working on my side, whether on this version or version 21.8.8, is the creation of Lora manga, comics, bd, and nothing else.

I tried a series over several days of parameter modification, installation of old drivers, etc., to no avail.

Many people have given up on the idea of designing Lora or CP given the disastrous results obtained.

Personally, I'm not giving up, but I'm also tired of these utterly disastrous results and that nothing is found to set things right.

Being forced to use SDXL while many people refuse this version is really a punishment for us."

tornado73 commented 1 year ago

For information, the latest version that correctly teaches on realism for AMD 6000 cards line -is -21.5.8, everything above is horror -) The forced transfer to SDXL, designed exclusively for new generation cards with a large amount of memory, left many enthusiasts behind I ran training on my card, on the latest version after editing dependencies, but the results are terrible, with different settings, It turns out that AMD is overboard-)

oo7male commented 11 months ago

Totally agree it's literally impossible to get consistent LoRA results with recent versions of kohya_ss. The only version that has been working pretty good for me is the collab version https://colab.research.google.com/github/hollowstrawberry/kohya-colab/blob/main/Lora_Trainer.ipynb which in turn uses https://github.com/kohya-ss/sd-scripts. Both commit e6ad3cbc66130fdc3bf9ecd1e0272969b1d613f7 and 9a67e0df390033a89f17e70df5131393692c2a55 seems to work fine. Just sharing if anyone needs to try.

AIEXAAA commented 11 months ago

Totally agree it's literally impossible to get consistent LoRA results with recent versions of kohya_ss. The only version that has been working pretty good for me is the collab version https://colab.research.google.com/github/hollowstrawberry/kohya-colab/blob/main/Lora_Trainer.ipynb which in turn uses https://github.com/kohya-ss/sd-scripts. Both commit e6ad3cb and 9a67e0d seems to work fine. Just sharing if anyone needs to try.

You can try this method, it works for me regardless of the version: https://github.com/kohya-ss/sd-scripts/issues/855#issuecomment-1748951832

Please note that due to version updates, the line numbers may not be consistent, but the modified code is consistent

WilliamKappler commented 9 months ago

Sorry, I have been away... and last time I was here, missed this entire issue somehow.

Previously, I spent a lot of time looking into this issue and managed to find a way to reliably reproduce the "old" LORA behavior as described here: https://github.com/bmaltais/kohya_ss/issues/1291#issuecomment-1736968853 - though the results are not exactly the same as I got previously.

I can't agree with some of the comments about this only being a problem for realism. I've had all these issues trying to train and retrain a cartoony LORA, but perhaps that is because I am using NAI. Maybe I don't know what I am doing and have less margin of error.

Another observation I had about this matter is that the newer Kohya gives more reliable, but worse results. The old one gives much less stable results, but some of them are high quality. Put another way: 'new' is almost all poor quality images, 'old' is mostly awful but some great images.

heartlocket commented 8 months ago

Is Kohya_ss effectively over for non SDXL creators? has there been a fork or a dedicated project since then? I am curious how people are making loras now

bmaltais commented 8 months ago

Kohya_ss the author of the sd-scripts code base I use in this repo is not maintaining an sd1.5 branch… so I guess this is pretty much the end of the sd1.5 only code base.

His code should support both sd1.5 and SDXL but some of the new modules required may not produce the same sd1.5 results it used to.

I suggest you raise this concern directly with him on his sd-scripts repo.

XT-404 commented 8 months ago

Hello everyone,

It's been a while since I've posted in this topic, which I created due to multiple anomalies related to the Kohya_ss script. I'd like to clarify, as bmaltais mentioned, that he is not the original author of this script. Instead, he uses the independently developed Kohya_ss script.

Since my last post, I have achieved a lot. After several months of testing, I've noticed significant changes with the integration of SDXL into the SD1.5 Kohya_ss script. I've chosen to focus on version v21.8.10, which allows me, in 90% of cases, to create various types of Lora, in terms of style or concept. However, one issue persists since the addition of SDXL: the realism of characters, known or not, with any training checkpoint model.

To overcome this challenge, I developed a specific Checkpoint that excludes images of the Cartoon/2D/3D/ANIME/MANGA/2.5D type. By training the lora with realistic images, they are transformed into BD/COMICS/Cartoon versions, etc. The Checkpoint I created then transforms these 2D/anime images back into realistic versions. To date, this is the only method I've found to achieve pure realism with functional lora images.

I've also experimented with other training systems that have yielded similar results to Bmaltais's code. These systems all seem to be based on the same developer, the creator of the Kohya_ss script.

Currently, I am not aware of any ongoing project aiming to develop a script similar to Bmaltais's for SD1.5 users who prefer to stay on this version. Unless a talented developer like Bmaltais embarks on such a project, it seems that the only alternative is to stick to functional older versions and block updates.

Best regards, XT404

AIEXAAA commented 8 months ago

Currently, I am not aware of any ongoing project aiming to develop a script similar to Bmaltais's for SD1.5 users who prefer to stay on this version. Unless a talented developer like Bmaltais embarks on such a project, it seems that the only alternative is to stick to functional older versions and block updates.

Best regards, XT404

Have you tried the modification method I mentioned before?

The latest version of Kohya_ss has basically fixed the SD1.5 problem, and the only remaining issue is the reproducibility of the loss function.

Because Kohya_ss corrected the SD1.5 problem, but at the same time modified some references and loaded VAE into xformess, this caused subsequent versions to still be trainable but the loss function is different from before. To restore the exact same loss function, just follow my modification method.

The evidence lies in the fact that I trained with the old version of Kohya_ss, trained with the latest version of Kohya_ss, and made the code modifications I mentioned. The lora trained by both under the same seed are almost identical in action and appearance.

XT-404 commented 8 months ago

#

Have you tried the modification method I mentioned earlier?

The latest version of Kohya_ss essentially resolved the SD1.5 issue, and the only remaining problem is the reproducibility of the loss function.

Since Kohya_ss fixed the SD1.5 issue but at the same time altered some references and loaded VAE into xformers, the subsequent versions could still be trained, but the loss function is different than before. To exactly restore the same loss function, simply follow my modification method.

The proof lies in the fact that I trained with the old version of Kohya_ss, with the latest version of Kohya_ss, and made the code changes I mentioned. The LORAs formed by both under the same seed are almost identical in action and appearance. # Hello @AIEXAAA

After applying the suggested method in the comment on the GitHub topic (https://github.com/kohya-ss/sd-scripts/issues/855#issuecomment-1748951832), I encountered several technical difficulties.

  1. Modification of the library\model_util.py file: Changing the code to initialize the loss values of the SD1.5 training seems to affect the results. By replacing the initial code block with the suggested one, the initial values become identical. However, this did not resolve the main problem.

  2. Modification of the train_network.py file: Removing the following lines, intended for compatibility with PyTorch 2.0.0 and memory efficiency, resulted in the failure of the training launch:

    if torch.__version__ >= "2.0.0":
       vae.set_use_memory_efficient_attention_xformers(args.xformers)

    After reinstalling these lines, the problem persisted.

In conclusion, despite following the instructions scrupulously and checking for potential manipulation errors, the modified script does not function correctly. Reinstalling the script in its original version restored its operation, but the problem of realism remains unresolved.

I remain open to any further suggestions or assistance to rectify these issues.

AIEXAAA commented 8 months ago

I remain open to any further suggestions or assistance to rectify these issues.

It might be a translation issue, I’m somewhat unclear about your response.

Are you saying that when you make modifications according to the second point, your program throws an error?

If so, the most likely reason is that your version of Kohya_ss is not up-to-date. In one version of Kohya_ss, when the aforementioned two lines of code are removed, the GPU’s RAM usage becomes huge, leading to an error. The latest version of Kohya_ss has already fixed this.

If it’s not a program error, but you still can’t reproduce lora after the modification, then this is beyond what I can explain.

XT-404 commented 8 months ago

@AIEXAAA I am not the latest version of kohya_ss, I use the version: v21.8.10 and indeed if I delete the line:

if torch.__version__ >= "2.0.0":
vae.set_use_memory_efficient_attention_xformers(args.xformers)

launching the training crashes automatically unless of course I reinstall it as originally

AIEXAAA commented 8 months ago

launching the training crashes automatically unless of course I reinstall it as originally

I dare not make a definitive statement here, but as you can see from the code, if the PyTorch version is too low, it will not load. Therefore, even if you remove this section of code, there should be no problem. Because after removal, it’s as if your PyTorch version is too low.

So, I’m puzzled by your results.

Additionally, changing

  if torch.__version__ >= "2.0.0":

to

  if torch.__version__ <= "2.0.0":

actually has the same effect. This way, you don’t need to reinstall it, and if it can’t run, you can directly change it back.

OriginLive commented 8 months ago

Spent 3 days trying to train a 1.5 checkpoint, only to find out it doesn't work on 1.5

3 days wasted, thanks obmaltisama

XT-404 commented 8 months ago

Spent 3 days trying to train a 1.5 checkpoint, only to find out it doesn't work on 1.5

3 days wasted, thanks obmaltisama

Greeting @OriginLive What version of Kohya_ss are you running on? for checkpoint training the version that I indicate works, it works in a different way from before all the modifications and implementation of sdxl my works if you take the time to do things well and an efficient and correct configuration, of course that requires: tests, analysis and clean dataset.

OriginLive commented 8 months ago

I was running latest. What version are you suggesting to use? What needs to be done? I've tried a 27 release before sdxl was mentioned but i can't use ui there with the current python version

XT-404 commented 8 months ago

I was running latest. What version are you suggesting to use? What needs to be done? I've tried a 27 release before sdxl was mentioned but i can't use ui there with the current python version

@OriginLive,

For old versions before the insertion of SDXL there are modifications to be made in a Python script file that @bmaltais stated earlier in the topic. the version I currently use to create Lora & Checkpoint and the following version: 21.8.10 95% functional.

the 0.5% absent is linked to direct Realism which does not work on any type of training and configuration or checkpoint.

OriginLive commented 8 months ago

What changes, there's drivers mentioned and all sorts of stuff like a different branch? Could you help out a bit more, i'm trying to get 1.5 working

XT-404 commented 8 months ago

What changes, there's drivers mentioned and all sorts of stuff like a different branch? Could you help out a bit more, i'm trying to get 1.5 working

Make it very professional and detailed in explanations: @OriginLive So, the version I am currently using is the following: 21.8.10 available at the following link: https://github.com/bmaltais/kohya_ss/releases/tag/v21.8.10 Download link in Zip format here: https://github.com/bmaltais/kohya_ss/archive/refs/tags/v21.8.10.zip

Simply install the version without doing any updates, once the installation is done without performing the update or updates. Do as usual. On older versions of Kohya_ss, training generally ran at 1000 steps to get a clean, correct result. Now, with the same number of images to get something correct and clean, training needs to run on a minimum of 2800 steps. To achieve a significant training improvement gain, you can use regulation images directly related to what you want to train. Of course, there's no need to insert the caption of the images for regulation. Regarding the batch, always use batch 1, batches 2, 3, 4 are completely out of service. I provide you with an example of the training method I practice to obtain viable and functional Lora or checkpoints:

In the settings section, I proceed as follows:

Training batch size: 1 | epoch 10 | max train: none | max train steps: none | save every epoch 1 | caption extension: .txt |
mixed precision: BF16 (I have a 4090) |
save precision: BF16
number of CPU: 2
seed: 1111
cache latents: active
cache latents to disk: disabled
LR scheduler: Constant
Optimizer: AdamW
LR scheduler extra arguments: None
Optimizer extra arguments: None
Learning rate: 0.0001
LR warmup: 0
LR number of cycles: None
LR power: None
Max resolution: 512x512
Stop text encoder training: 0
Enable buckets: active
Minimum bucket resolution: 256
Maximum bucket resolution: 2048
Text encoder learning rate: 0.00001
Unet learning rate: 0.0001
Network rank (dimension) 256
Network Alpha: 256
You can also modify it to 128 / 128
both work very well, the precision is more refined at 256 but the volume is more substantial (for a Lora).

That's a setting I've done on my side and it works perfectly, except as said for realism where the training fails at 99.8%.

OriginLive commented 8 months ago

Done this, but am still getting picassio on checkpoints: AnimeVN_1 5_db_v1_20240107195206_000250_00_415254648

XT-404 commented 8 months ago

@OriginLive can you provide me with the following information: what version of cuda are you using? and what driver do you use for your graphics card?

XT-404 commented 8 months ago

Go DL and install : https://developer.nvidia.com/cuda-11-6-0-download-archive and install driver Nvidia : 531.61 https://www.nvidia.fr/download/driverResults.aspx/204335/fr

OriginLive commented 8 months ago

`=============================================================

Modules installed outside the virtual environment were found. This can cause issues. Please review the installed modules.

You can uninstall all local modules with: deactivate pip freeze > uninstall.txt pip uninstall -y -r uninstall.txt

=============================================================

20:25:07-691269 INFO Version: v21.8.10

20:25:07-694779 INFO nVidia toolkit detected 20:25:08-780964 INFO Torch 2.0.1+cu118 20:25:08-794772 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 20:25:08-795774 INFO Torch detected GPU: NVIDIA GeForce RTX 3090 VRAM 24576 Arch (8, 6) Cores 82 20:25:08-796772 INFO Verifying modules instalation status from requirements_windows_torch2.txt... 20:25:08-802773 INFO Verifying modules instalation status from requirements.txt... no language 20:25:11-156841 INFO headless: False 20:25:11-162841 INFO Load CSS... Running on local URL: http://127.0.0.1:7860 `

image

Leme try w/ an old driver

OriginLive commented 8 months ago

I installed the old drives, but it still says 11.8 for cuda :/ even though 11.6 was installed and the old drivers were installed as well

OriginLive commented 8 months ago

AnimeVN_1 5_db_v1_20240107204946_000250_00_415254648

still

XT-404 commented 8 months ago

@OriginLive version 11.8 must be removed

OriginLive commented 8 months ago

@OriginLive version 11.8 must be removed

I cannot, i do not see it in the list of available programs. Maybe it's part of pytoch?

edit: https://discord.gg/ySHHDKkhat i'm avail on training sd discord