Closed mykeehu closed 1 year ago
same exact issue, it started after the latest update I think.
Replace CrossAttention.forward to use xformers
caching latents.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 498/498 [00:29<00:00, 17.12it/s]
import network module: networks.lora
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
Traceback (most recent call last):
File "S:\kohya\kohya_ss\train_network.py", line 507, in <module>
train(args)
File "S:\kohya\kohya_ss\train_network.py", line 150, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "S:\kohya\kohya_ss\library\train_util.py", line 1536, in get_optimizer
assert optimizer_type is None or optimizer_type == "", "both option use_8bit_adam and optimizer_type are specified / use_8bit_adamとoptimizer_typeの両方のオプションが 指定されています"
AssertionError: both option use_8bit_adam and optimizer_type are specified / use_8bit_adamとoptimizer_typeの両方のオプションが指定されています
Traceback (most recent call last):
File "C:\Users\Yvggeniy\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Yvggeniy\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "S:\kohya\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "S:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "S:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "S:\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['S:\\kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--pretrained_model_name_or_path=S:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/My_mixes/My_anyhentai_abyss_reva1_grape32_pov.safetensors', '--train_data_dir=V:/ImagesForSDTrining/Lora training/all the way through/image', '--resolution=512,512', '--output_dir=V:/ImagesForSDTrining/Lora training/all the way through/model', '--logging_dir=V:/ImagesForSDTrining/Lora training/all the way through/log', '--network_alpha=128', '--training_comment=Trained on My_anyhentai_abyss_reva1_grape32_pov', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=all the way through_v3', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=24900', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.
Exact same error here, first time using LORA and im getting this error, is there a way i can install an older version?
Try uncheck the "Use 8bit adam" button in Advanced Configuration
Try uncheck the "Use 8bit adam" button in Advanced Configuration
Its working now! Thank you very much Guyray!!!! how could i been so blind to not see that option, thanks man
Yeah... it is a config change issue introduced by the latest kohya_ss trainer update... I am not sure how best to fix this... I tought of removing the old 8bit checkbox but this would break the older config files... Perhaps I could implement some logic to disable the 8bit checkbox when a user select anything but the AdamW8bit option...
AFTER unchecking the "Use 8bit adam" button in Advanced Configuration, I found it still doesn‘t work. I want to know how to fix it.
loading text encoder: <All keys matched successfully>
Replace CrossAttention.forward to use xformers
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00, 2.29it/s]
import network module: networks.lora
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use AdamW optimizer | {}
Traceback (most recent call last):
File "D:\AI Drawing\Lora\kohya_ss\train_network.py", line 507, in <module>
train(args)
File "D:\AI Drawing\Lora\kohya_ss\train_network.py", line 176, in train
unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 876, in prepare
result = tuple(
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 877, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 741, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 912, in prepare_model
model = model.to(self.device)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 1749, in to
return super().to(*args, **kwargs)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in to
return self._apply(convert)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
param_applied = fn(param)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.42 GiB already allocated; 0 bytes free; 3.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "C:\Users\25424\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\25424\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\AI Drawing\Lora\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\AI Drawing\\Lora\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI Drawing/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp16Fix.safetensors', '--train_data_dir=D:/AI Drawing/Lora/Lora_database/shiya/image', '--resolution=512,512', '--output_dir=D:/AI Drawing/Lora/Lora_database/shiya/model', '--logging_dir=', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=90', '--train_batch_size=1', '--max_train_steps=900', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--cache_latents', '--optimizer_type=AdamW', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
AFTER unchecking the "Use 8bit adam" button in Advanced Configuration, I found it still doesn‘t work. I want to know how to fix it.
loading text encoder: <All keys matched successfully> Replace CrossAttention.forward to use xformers caching latents. 100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00, 2.29it/s] import network module: networks.lora create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} Traceback (most recent call last): File "D:\AI Drawing\Lora\kohya_ss\train_network.py", line 507, in <module> train(args) File "D:\AI Drawing\Lora\kohya_ss\train_network.py", line 176, in train unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare( File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 876, in prepare result = tuple( File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 877, in <genexpr> self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 741, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 912, in prepare_model model = model.to(self.device) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 1749, in to return super().to(*args, **kwargs) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in to return self._apply(convert) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) [Previous line repeated 3 more times] File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply param_applied = fn(param) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 925, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.42 GiB already allocated; 0 bytes free; 3.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Traceback (most recent call last): File "C:\Users\25424\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\25424\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI Drawing\Lora\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\AI Drawing\Lora\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\AI Drawing\\Lora\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/AI Drawing/stable-diffusion-webui/models/Stable-diffusion/chilloutmix_NiPrunedFp16Fix.safetensors', '--train_data_dir=D:/AI Drawing/Lora/Lora_database/shiya/image', '--resolution=512,512', '--output_dir=D:/AI Drawing/Lora/Lora_database/shiya/model', '--logging_dir=', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=90', '--train_batch_size=1', '--max_train_steps=900', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--cache_latents', '--optimizer_type=AdamW', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
Yes, I have the same error. I've tried with a low configuration (i have a 3080 10GB), but with 'Use 8bit adam' enabled I get the first error and if I don't have it enabled I get this error 'CUDA out of memory. Tried to allocate...'. It's strange because last night it was working and this morning it's not.
The issue is you need to use 8bit adam given the amount of VRAM you have. Make sure to only use AdamW8bit as the optimizer.
The issue is you need to use 8bit adam given the amount of VRAM you have. Make sure to only use AdamW8bit as the optimizer.
Thanks, it's working now. Just disabling 'Use 8bit adam' and AdamW8Bit in the 'optimizer' now works. And in case it helps, some clueless like me: I was doing it in 'dreamboth' instead of 'dreamboth LoRA'.
Wow, this breaks all trainings and yet the dev didnt noticed ? thanks for a fix
The issue is you need to use 8bit adam given the amount of VRAM you have. Make sure to only use AdamW8bit as the optimizer.
Thanks, it's working now. Just disabling 'Use 8bit adam' and AdamW8Bit in the 'optimizer' now works. And in case it helps, some clueless like me: I was doing it in 'dreamboth' instead of 'dreamboth LoRA'.
You are my hero. Thank you so much! This is for you: 🏆
Try uncheck the "Use 8bit adam" button in Advanced Configuration Its working now! Thank you very much Guyray!!!! how could i been so blind to not see that option, thanks man
You're welcome, brother. I'm glad I could help
Yeah... it is a config change issue introduced by the latest kohya_ss trainer update... I am not sure how best to fix this... I tought of removing the old 8bit checkbox but this would break the older config files... Perhaps I could implement some logic to disable the 8bit checkbox when a user select anything but the AdamW8bit option...
Since the GUI (from what I understand) always passes an optimizer, wouldn't it be a better idea to remove that checkbox completely? And "migrate" existing configs to this new state? E.g. something like:
And then in any case, remove the checkbox from the GUI and config. Would that work?
Anyone could run this using 8G VRAM after the update? I selected optimizer AdamW8bit but now it gives the out-of-memory error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.20 GiB already allocated; 0 bytes free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Anyone could run this using 8G VRAM after the update? I selected optimizer AdamW8bit but now it gives the out-of-memory error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 7.20 GiB already allocated; 0 bytes free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Are you by chance using the "Dreambooth" tab instead of the "Dreambooth LoRA" tab? That seems to constantly happen to a lot of people (including myself 😄).
Thanks, @mxharms , you were right. By the way, there is another error, do you have any idea?
Traceback (most recent call last):
File "J:\sd\kohya_ss\train_network.py", line 507, in <module>
train(args)
File "J:\sd\kohya_ss\train_network.py", line 135, in train
network.load_weights(args.network_weights)
File "J:\sd\kohya_ss\networks\lora.py", line 139, in load_weights
self.weights_sd = torch.load(file, map_location='cpu')
File "J:\sd\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "J:\sd\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '{'.
Traceback (most recent call last):
File "C:\Users\Luis\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Luis\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "J:\sd\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
Thanks, @mxharms , you were right. By the way, there is another error, do you have any idea?
Traceback (most recent call last): File "J:\sd\kohya_ss\train_network.py", line 507, in <module> train(args) File "J:\sd\kohya_ss\train_network.py", line 135, in train network.load_weights(args.network_weights) File "J:\sd\kohya_ss\networks\lora.py", line 139, in load_weights self.weights_sd = torch.load(file, map_location='cpu') File "J:\sd\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "J:\sd\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 920, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '{'. Traceback (most recent call last): File "C:\Users\Luis\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Luis\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "J:\sd\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
Sorry, don't really know much about Python or the internals of this. ☹️
Thanks, @mxharms , you were right. By the way, there is another error, do you have any idea?
Traceback (most recent call last): File "J:\sd\kohya_ss\train_network.py", line 507, in <module> train(args) File "J:\sd\kohya_ss\train_network.py", line 135, in train network.load_weights(args.network_weights) File "J:\sd\kohya_ss\networks\lora.py", line 139, in load_weights self.weights_sd = torch.load(file, map_location='cpu') File "J:\sd\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "J:\sd\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 920, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '{'. Traceback (most recent call last): File "C:\Users\Luis\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Luis\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "J:\sd\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "J:\sd\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
Sorry, don't really know much about Python or the internals of this. ☹️
No problems :)
This thread saved me my insanity, the 8 Bit switch, was throwing me off. Thank you for this post! :)
This thread saved me my insanity, the 8 Bit switch, was throwing me off. Thank you for this post! :)
Same here. Thanks!!!
I still cannot train 8bit Adam...need to manually edit it out and run it in the terminal!
So. since the last update wich is the optimal config to train persons in lora? wich optimicer and with or without adam 8 bit? any1 could help me ? will be great, i am so confused atm
If your hardware permit it, train without 8bit... So AdamW. Otherwise use AdamW8bit.
Version 21.0.1 should now address this.
i am trying with 50 photos batch size 2 epoch 6 fp 16 (both) lr 0.0001 / constant / LR warmup 0 / optimizer adamW txt e LR 5e-5 and unet lr 0.0001 nothing more in advanced except xformers
since your last commit the process is working well but not the results as far as i know, 50 photos (100_namefolder) so: 50 100 5000, ((5000/2) 6) gives me = 15.000 steps on 4 safetensors and noone of them did a good inference of the model. Any idea why ? PD: and now my results have 9.326KB instead of the usual 147.572KB
Network values are low, set both to 128
Network values are low, set both to 128
since the last update, works like a charm!!! i'm so happy and ty for answer ^^!
last update, 8bit_adam cannot be used, AssertionError: both option use_8bit_adam and optimizer_type are specified / use_8bit_adamとoptimizer_typeの両方のオプションが指定されています
last update, 8bit_adam cannot be used, AssertionError: both option use_8bit_adam and optimizer_type are specified / use_8bit_adamとoptimizer_typeの両方のオプションが指定されています
Yeah I'm getting the same error even after updating.
it still not working even after disabling adam8bit
Folder 100_test: 1100 steps max_train_steps = 1100 stop_text_encoder_training = 0 lr_warmup_steps = 110 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/perfectWorld_perfectWorldBakedVAE.safetensors" --train_data_dir="D:/stable-diffusion/output/image" --resolution=512,512 --output_dir="D:/stable-diffusion/output/model" --logging_dir="D:/stable-diffusion/output/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=8 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="110" --train_batch_size="1" --max_train_steps="1100" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="DAdaptation" --bucket_reso_steps=64 --bucket_no_upscale prepare tokenizer Use DreamBooth method. Traceback (most recent call last): File "D:\stable-diffusion\kohya\kohya_ss\train_network.py", line 507, in <module> train(args) File "D:\stable-diffusion\kohya\kohya_ss\train_network.py", line 61, in train train_dataset = DreamBoothDataset(args.train_batch_size, args.train_data_dir, args.reg_data_dir, TypeError: DreamBoothDataset.__init__() takes 13 positional arguments but 21 were given Traceback (most recent call last): File "D:\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\stable-diffusion\kohya\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "D:\stable-diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "D:\stable-diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "D:\stable-diffusion\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\stable-diffusion\\kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:/stable-diffusion/stable-diffusion-webui/models/Stable-diffusion/perfectWorld_perfectWorldBakedVAE.safetensors', '--train_data_dir=D:/stable-diffusion/output/image', '--resolution=512,512', '--output_dir=D:/stable-diffusion/output/model', '--logging_dir=D:/stable-diffusion/output/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=110', '--train_batch_size=1', '--max_train_steps=1100', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--optimizer_type=DAdaptation', '--bucket_reso_steps=64', '--bucket_no_upscale']' returned non-zero exit status 1.
I have gotten it to work after making sure the lower checkbox "Use 8bit adam" in advanced configuration is not selected even though "AdamW8bit" is selected under the "Optimizer" selection box.
I give up I try to fix this issue for 3 weeks now and it still doesn't work
I'd recommend just using the Lion optimizer instead, works about the same but doesn't suffer from the same overtraining issues. Just make sure to divide your learning rate by 10
I cannot find the button to uncheck 8bit Adams? Did they remove it?
I'd recommend just using the Lion optimizer instead, works about the same but doesn't suffer from the same overtraining issues. Just make sure to divide your learning rate by 10
Hey, what do you mean by dividing the learning rate by 10? So if I have 25 images what do I put it and where?
I started the latest version with the usual parameters, but now I got an error. Seems there is something wrong with the optimizer? I tried, with AdamW and AdamW8bit, without success.
If I turned off the "Use 8bit adam" option and selected AdamW, it started. So either this option should be disabled or, in the case of AdamW, ignore it when compiling the command.
What is the difference between AdamW and AdamW8bit? If I choose the former, will it cause a burn-in (as if I've overtrain it)?