bmaltais / kohya_ss

Apache License 2.0
9.54k stars 1.23k forks source link

TypeError: Accelerator.__init__() got an unexpected keyword argument 'project_dir' #1198

Closed magicwang1111 closed 8 months ago

magicwang1111 commented 1 year ago

18:36:25-585000 INFO Start training LoRA Kohya DyLoRA ... 18:36:25-586000 INFO Valid image folder names found in: /home/wangwang/ai/stable-diffusion-webui/train/0618 18:36:25-593201 INFO Folder 5_aliproduct: 6905 images found 18:36:25-593957 INFO Folder 5_aliproduct: 34525 steps 18:36:25-594630 INFO Total steps: 34525 18:36:25-595247 INFO Train batch size: 32 18:36:25-595843 INFO Gradient accumulation steps: 1.0 18:36:25-596458 INFO Epoch: 10 18:36:25-597038 INFO Regulatization factor: 1 18:36:25-597649 INFO max_train_steps (34525 / 32 / 1.0 10 1) = 10790 18:36:25-598415 INFO stop_text_encoder_training = 0 18:36:25-599029 INFO lr_warmup_steps = 1079 18:36:25-599787 INFO Saving training config to /home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/LyCORIS/aliproduct0714_20230714-183625.json... 18:36:25-600760 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --pretrained_model_name_or_path="/home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/Stable-diffusion/v1-5-pru ned.ckpt" --train_data_dir="/home/wangwang/ai/stable-diffusion-webui/train/0618" --resolution="640,960" --output_dir="/home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/LyCORIS" --network_alpha="256" --save_model_as=safetensors --network_module=networks.dylora --network_args conv_dim="32" conv_alpha="32" unit="8" rank_dropout="0.1" --text_encoder_lr=2e-06 --unet_lr=2e-05 --network_dim=256 --output_name="aliproduct0714" --lr_scheduler_num_cycles="10" --network_dropout="0.1" --learning_rate="2e-05" --lr_scheduler="constant_with_warmup" --lr_warmup_steps="1079" --train_batch_size="32" --max_train_steps="10790" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --cache_latents --cache_latents_to_disk --optimizer_type="Prodigy" --max_data_loader_n_workers="0" --max_token_length=225 --resume="/home/wangwang/test/kohya_ss/last_staus" --keep_tokens="1" --bucket_reso_steps=64 --save_state --shuffle_caption --gradient_checkpointing --full_fp16 --xformers --bucket_no_upscale --noise_offset=0.1 --wandb_api_key="22ddbffd5936bbb30f5c8404cf885890885514cf" --sample_sampler=euler_a --sample_prompts="/home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/LyCORIS/sample/prompt.txt" --sample_every_n_epochs="1" --sample_every_n_steps="40" 2023-07-14 18:36:26.807469: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-07-14 18:36:26.845291: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-07-14 18:36:27.377273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [18:36:28] WARNING The following values were not passed to accelerate launch and had defaults used instead: launch.py:1088 --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. 2023-07-14 18:36:30.041045: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct contains 6905 image files No caption file found for 6905 images. Training will continue without captions for these images. If class token exists, it will be used. / 6905枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct/0.png /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct/1.png /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct/10.png /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct/100.png /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct/1000.png /home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct/1001.png... and 6900 more 34525 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 32 resolution: (640, 960) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "/home/wangwang/ai/stable-diffusion-webui/train/0618/5_aliproduct" image_count: 6905 num_repeats: 5 shuffle_caption: True keep_tokens: 1 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: aliproduct caption_extension: .caption

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6905/6905 [00:00<00:00, 35407.84it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 135 bucket 1: resolution (576, 896), count: 5 bucket 2: resolution (640, 960), count: 32180 bucket 3: resolution (768, 768), count: 2205 mean ar error (without repeats): 3.5003452816631544e-06 preparing accelerator ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/wangwang/test/kohya_ss/./train_network.py:974 in │ │ │ │ 971 │ args = train_util.read_config_from_file(args, parser) │ │ 972 │ │ │ 973 │ trainer = NetworkTrainer() │ │ ❱ 974 │ trainer.train(args) │ │ 975 │ │ │ │ /home/wangwang/test/kohya_ss/./train_network.py:205 in train │ │ │ │ 202 │ │ │ │ 203 │ │ # acceleratorを準備する │ │ 204 │ │ print("preparing accelerator") │ │ ❱ 205 │ │ accelerator = train_util.prepare_accelerator(args) │ │ 206 │ │ is_main_process = accelerator.is_main_process │ │ 207 │ │ │ │ 208 │ │ # mixed precisionに対応した型を用意しておき適宜castする │ │ │ │ /home/wangwang/test/kohya_ss/library/train_util.py:3266 in prepare_accelerator │ │ │ │ 3263 │ │ │ if args.wandb_api_key is not None: │ │ 3264 │ │ │ │ wandb.login(key=args.wandb_api_key) │ │ 3265 │ │ │ ❱ 3266 │ accelerator = Accelerator( │ │ 3267 │ │ gradient_accumulation_steps=args.gradient_accumulation_steps, │ │ 3268 │ │ mixed_precision=args.mixed_precision, │ │ 3269 │ │ log_with=log_with, │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ TypeError: Accelerator.init() got an unexpected keyword argument 'project_dir' ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/wangwang/test/kohya_ss/venv/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /home/wangwang/test/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cl │ │ i.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /home/wangwang/test/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py:110 │ │ 4 in launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /home/wangwang/test/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py:567 │ │ in simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/home/wangwang/test/kohya_ss/venv/bin/python', './train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=/home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned.ckpt', '--train_data_dir=/home/wangwang/ai/stable-diffusion-webui/train/0618', '--resolution=640,960', '--output_dir=/home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/LyCORIS', '--network_alpha=256', '--save_model_as=safetensors', '--network_module=networks.dylora', '--network_args', 'conv_dim=32', 'conv_alpha=32', 'unit=8', 'rank_dropout=0.1', '--text_encoder_lr=2e-06', '--unet_lr=2e-05', '--network_dim=256', '--output_name=aliproduct0714', '--lr_scheduler_num_cycles=10', '--network_dropout=0.1', '--learning_rate=2e-05', '--lr_scheduler=constant_with_warmup', '--lr_warmup_steps=1079', '--train_batch_size=32', '--max_train_steps=10790', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Prodigy', '--max_data_loader_n_workers=0', '--max_token_length=225', '--resume=/home/wangwang/test/kohya_ss/last_staus', '--keep_tokens=1', '--bucket_reso_steps=64', '--save_state', '--shuffle_caption', '--gradient_checkpointing', '--full_fp16', '--xformers', '--bucket_no_upscale', '--noise_offset=0.1', '--wandb_api_key=22ddbffd5936bbb30f5c8404cf885890885514cf', '--sample_sampler=euler_a', '--sample_prompts=/home/wangwang/ai/stable-diffusion-webui/stable-diffusion-webui/models/LyCORIS/sample/prompt.txt', '--sample_every_n_epochs=1', '--sample_every_n_steps=40']' returned non-zero exit status 1.

magicwang1111 commented 1 year ago

system linux Fri Jul 14 18:38:50 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:00:07.0 Off | 0 | | N/A 30C P0 52W / 400W | 0MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

markojak commented 1 year ago

Same problem for me. Appears to be deprecated with logging_dir https://github.com/huggingface/accelerate/issues/1619 Have you tried to set logging_dir