Linaqruf / kohya-trainer

Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning
Apache License 2.0
1.83k stars 300 forks source link

CalledProcessError: Command '['/usr/bin/python3', 'sdxl_train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.toml', '--config_file=/content/LoRA/config/config_file.toml']' died with <Signals.SIGKILL: 9>. #265

Open loboere opened 1 year ago

loboere commented 1 year ago

colab free T4

I want to train lora in sdxl 1.0 but when training it gives me an error.

my images are of different sizes but I don't know if that has something to do with it.

image

Loading settings from /content/LoRA/config/config_file.toml...
/content/LoRA/config/config_file
prepare tokenizers
update token length: 225
Training with captions.
loading existing metadata: /content/LoRA/meta_lat.json
metadata has bucket info, enable bucketing / メタデータにbucket情報があるためbucketを有効にします
using bucket info in metadata / メタデータ内のbucket情報を使います
[Dataset 0]
  batch_size: 4
  resolution: (1024, 1024)
  enable_bucket: True
  min_bucket_reso: None
  max_bucket_reso: None
  bucket_reso_steps: None
  bucket_no_upscale: None

  [Subset 0 of Dataset 0]
    image_dir: "/content/LoRA/train_data"
    image_count: 23
    num_repeats: 1
    shuffle_caption: False
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    token_warmup_min: 1,
    token_warmup_step: 0,
    metadata_file: /content/LoRA/meta_lat.json

[Dataset 0]
loading image sizes.
100% 23/23 [00:00<00:00, 393750.99it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (704, 1024), count: 3
bucket 1: resolution (768, 1024), count: 1
bucket 2: resolution (832, 1024), count: 2
bucket 3: resolution (896, 1024), count: 1
bucket 4: resolution (1024, 576), count: 9
bucket 5: resolution (1024, 704), count: 2
bucket 6: resolution (1024, 960), count: 2
bucket 7: resolution (1024, 1024), count: 3
mean ar error (without repeats): 0.0
noise_offset is set to 0.0357 / noise_offsetが0.0357に設定されました
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: /content/pretrained_model/sd_xl_base_1.0.safetensors
building U-Net
loading U-Net from checkpoint
U-Net:  <All keys matched successfully>
building text encoders
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/bin/accelerate:8 in <module>                                      │
│                                                                              │
│   5 from accelerate.commands.accelerate_cli import main                      │
│   6 if __name__ == '__main__':                                               │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])     │
│ ❱ 8 │   sys.exit(main())                                                     │
│   9                                                                          │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.p │
│ y:45 in main                                                                 │
│                                                                              │
│   42 │   │   exit(1)                                                         │
│   43 │                                                                       │
│   44 │   # Run                                                               │
│ ❱ 45 │   args.func(args)                                                     │
│   46                                                                         │
│   47                                                                         │
│   48 if __name__ == "__main__":                                              │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:918 in │
│ launch_command                                                               │
│                                                                              │
│   915 │   elif defaults is not None and defaults.compute_environment == Comp │
│   916 │   │   sagemaker_launcher(defaults, args)                             │
│   917 │   else:                                                              │
│ ❱ 918 │   │   simple_launcher(args)                                          │
│   919                                                                        │
│   920                                                                        │
│   921 def main():                                                            │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:580 in │
│ simple_launcher                                                              │
│                                                                              │
│   577 │   process.wait()                                                     │
│   578 │   if process.returncode != 0:                                        │
│   579 │   │   if not args.quiet:                                             │
│ ❱ 580 │   │   │   raise subprocess.CalledProcessError(returncode=process.ret │
│   581 │   │   else:                                                          │
│   582 │   │   │   sys.exit(1)                                                │
│   583                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'sdxl_train_network.py', 
'--sample_prompts=/content/LoRA/config/sample_prompt.toml', 
'--config_file=/content/LoRA/config/config_file.toml']' died with 
<Signals.SIGKILL: 9>.
Linaqruf commented 1 year ago

image this is not a vae, but a checkpoint

also you need colab pro

loboere commented 1 year ago

:( , is there any hope that it will work in colab for free? , someone do something to make it work