Closed thibaudart closed 1 year ago
the call is
accelerate "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--attention=xformers" "--model_variant=base" "--normalize_masked_area_loss" "--unmasked_probability=0.0" "--max_denoising_strength=1.0" "--disable_cudnn_benchmark" "--sample_step_interval=500" "--pretrained_model_name_or_path=D:/sd/models/SD-2-1-512" "--pretrained_vae_name_or_path=" "--output_dir=models/nestyle2" "--seed=3434554" "--resolution=512" "--train_batch_size=20" "--num_train_epochs=1000" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=cosine" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=50" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start"
Number of buckets: 1 Bucket (512, 512) found 20, nice! Number of image-caption pairs: 20
ok, the solution is to regenerate latent cache each time I launch a training.
Hi
80% of the time I got this issue
I don't know why
Traceback (most recent call last): File "D:\stabletuner\scripts\trainer.py", line 1577, in
main()
File "D:\stabletuner\scripts\trainer.py", line 897, in main
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
ZeroDivisionError: division by zero