LoRA was trained, but does not work in Stable Diffusion on Linux

AlexSgt commented 1 year ago

Hi, I had two problems, one of which I solved, but the other I can't do. ~~When I installed LoRA on Linux, I get this error (RuntimeError: No such operator xformers::efficient_attention_forward_cutlass)~~:

Console Log

``` Folder 100_sizovadina: 2500 steps max_train_steps = 2500 stop_text_encoder_training = 0 lr_warmup_steps = 0 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --pretrained_model_name_or_path="XpucT/Deliberate" --train_data_dir="Lora/input" --resolution=512,512 --output_dir="Lora/output" --logging_dir="Lora/log" --network_alpha="128" --training_comment="trigger: sizovadina" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="sizovadina" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="2500" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1337" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale 2023-03-05 18:57:18.754125: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-05 18:57:18.897280: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-05 18:57:19.454393: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-05 18:57:19.454456: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-05 18:57:19.454467: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2023-03-05 18:57:21.239902: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-03-05 18:57:21.404428: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-03-05 18:57:22.002651: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory 2023-03-05 18:57:22.002728: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory 2023-03-05 18:57:22.002740: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /workspace/kohya_ss/venv/lib/python3.10/site-packages/xformers/_C.so) WARNING:root:WARNING: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /workspace/kohya_ss/venv/lib/python3.10/site-packages/xformers/_C.so) Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop prepare tokenizer Use DreamBooth method. prepare train images. found directory 100_sizovadina contains 25 image files 2500 train images with repeating. loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 3942.46it/s] prepare dataset prepare accelerator Using accelerator 0.15.0 or above. load Diffusers pretrained models Downloading (…)ain/model_index.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 584/584 [00:00<00:00, 569kB/s] text_encoder/model.safetensors not found Downloading (…)_checker/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.91k/4.91k [00:00<00:00, 4.26MB/s] Downloading (…)_encoder/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 500kB/s] Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 400kB/s] Downloading (…)rocessor_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 520/520 [00:00<00:00, 420kB/s] Downloading (…)cheduler_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 346/346 [00:00<00:00, 395kB/s] Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 806/806 [00:00<00:00, 797kB/s] Downloading (…)aed/unet/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.02k/1.02k [00:00<00:00, 1.01MB/s] Downloading (…)5aed/vae/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 582/582 [00:00<00:00, 444kB/s] Downloading (…)tokenizer/merges.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:03<00:00, 172kB/s] Downloading (…)tokenizer/vocab.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:09<00:00, 106kB/s] Downloading (…)_pytorch_model.bin";: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:17<00:00, 19.5MB/s] Downloading (…)"pytorch_model.bin";: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 492M/492M [00:23<00:00, 21.3MB/s] Downloading (…)"pytorch_model.bin";: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.22G/1.22G [00:56<00:00, 21.5MB/s] Downloading (…)_pytorch_model.bin";: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [02:12<00:00, 25.9MB/s] Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [02:14<00:00, 8.94s/it] /workspace/kohya_ss/venv/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.████████████████████████████████████▎ | 1.45G/3.44G [00:55<01:19, 25.0MB/s] warnings.warn(pytorch_model.bin";: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.44G/3.44G [02:12<00:00, 30.1MB/s] The config attributes {'class_embed_type': None, 'mid_block_type': 'UNetMidBlock2DCrossAttn', 'resnet_time_scale_shift': 'default'} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file. The config attributes {'scaling_factor': 0.18215} were passed to AutoencoderKL, but are not expected and will be ignored. Please verify your config.json configuration file. You have disabled the safety checker for by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . Replace CrossAttention.forward to use xformers caching latents. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:02<00:00, 9.31it/s] import network module: networks.lora create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 2500 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 2500 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 2500 steps: 0%| | 0/2500 [00:00 train(args) File "/workspace/kohya_ss/train_network.py", line 373, in train noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast return func(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward sample, res_samples = downsample_block( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward hidden_states = self.attn1(norm_hidden_states) + hidden_states File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/workspace/kohya_ss/library/train_util.py", line 1784, in forward_xformers out = xformers.ops.memory_efficient_attention( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/xformers/ops.py", line 865, in memory_efficient_attention return op.apply(query, key, value, attn_bias, p).reshape(output_shape) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/xformers/ops.py", line 319, in forward out, lse = cls.FORWARD_OPERATOR( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/xformers/ops.py", line 46, in no_such_operator raise RuntimeError( RuntimeError: No such operator xformers::efficient_attention_forward_cutlass - did you forget to build xformers with `python setup.py develop`? steps: 0%| | 0/2500 [00:00 sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python3', 'train_network.py', '--pretrained_model_name_or_path=XpucT/Deliberate', '--train_data_dir=Lora/input', '--resolution=512,512', '--output_dir=Lora/output', '--logging_dir=Lora/log', '--network_alpha=128', '--training_comment=trigger: sizovadina', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=sizovadina', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=2500', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1. ```

SOLVE: I decided to remove xformers==0.0.14 and install a version of xformers==0.0.16 after that the learning process was started:

Timeline

The second problem is: when I try to generate images, nothing changes. LoRA has no effect at all on generation.

Generation without LoRA

Generation with LoRA

I am using runpod service for train model on linux. Can you please suggest what is the problem?

devNegative-asm commented 1 year ago

install sd-webui-additional-networks extension and place the lora in extensions/sd-webui-additional-networks/models/lora/

tenghui98 commented 1 year ago

me too, two images( with/without lora) are not difference.

trustmiao commented 1 year ago

my situation is similar, i upgrade to 3090 recently, and then installed new nvidia driver, cuda, cudnn.. then, my kohya, which perfectly working with my old 1080, though no error given during training, gave no effect in sd any more. i tried reinstall everything, upgrade, downgrade with no luck. sd should be fine, as it works well with model i trained before. I am totally out of options, what to try

trist4n commented 1 year ago

might have got this fixed, see: https://github.com/bmaltais/kohya_ss/issues/318#issuecomment-1458016981

rexroth0619 commented 1 year ago

Can confirm & can replicate, went through exactly the same process as OP. First xformers complains, than swapping 0.0.16 doesn't have any effect in training results (doesn't learn the concept at all).

I am also running an instance on 3090 using Linux on runpod.

Per my experience xformers seems to be one of the most problem-ridden factors for SD generation and fine tuning...

bmaltais / kohya_ss

LoRA was trained, but does not work in Stable Diffusion on Linux #311