Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.83k stars 300 forks source link

CalledProcessError in SDXL training step #276

Open alectprasad opened 1 year ago

alectprasad commented 1 year ago

`Loading settings from /content/LoRA/config/config_file.toml... /content/LoRA/config/config_file prepare tokenizers update token length: 225 Training with captions. loading existing metadata: /content/LoRA/meta_lat.json metadata has bucket info, enable bucketing / メタデータにbucket情報があるためbucketを有効にします using bucket info in metadata / メタデータ内のbucket情報を使います [Dataset 0] batch_size: 2 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: None max_bucket_reso: None bucket_reso_steps: None bucket_no_upscale: None

[Subset 0 of Dataset 0] image_dir: "/content/LoRA/train_data" image_count: 30 num_repeats: 1 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, metadata_file: /content/LoRA/meta_lat.json

[Dataset 0] loading image sizes. 100% 30/30 [00:00<00:00, 511500.49it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (832, 1024), count: 18 bucket 1: resolution (1024, 1024), count: 12 mean ar error (without repeats): 0.0 noise_offset is set to 0.0357 / noise_offsetが0.0357に設定されました preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: /content/pretrained_model/sd_xl_base_1.0_0.9vae.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.p │ │ y:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:918 in │ │ launch_command │ │ │ │ 915 │ elif defaults is not None and defaults.compute_environment == Comp │ │ 916 │ │ sagemaker_launcher(defaults, args) │ │ 917 │ else: │ │ ❱ 918 │ │ simple_launcher(args) │ │ 919 │ │ 920 │ │ 921 def main(): │ │ │ │ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:580 in │ │ simple_launcher │ │ │ │ 577 │ process.wait() │ │ 578 │ if process.returncode != 0: │ │ 579 │ │ if not args.quiet: │ │ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.ret │ │ 581 │ │ else: │ │ 582 │ │ │ sys.exit(1) │ │ 583 │ ╰──────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'sdxl_train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.toml', '--config_file=/content/LoRA/config/config_file.toml']' died with <Signals.SIGKILL: 9>.`

I have colab pro and all I changed was from 0.9 to 1.0

nickthelegend commented 1 year ago

same issue i am facing i too changed from 0.9 to 1.0

KarelAI commented 12 months ago

I have this same problem, I have tried several times already.