subprocess.CalledProcessError: Command '['E:\\kohya_ss\\venv\\Scripts\\python.exe', 'E:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', './test/output/config_lora-20240822-182849.toml']' returned non-zero exit status 3221225477.

CNEA-lw commented 3 weeks ago

Complete error content: 1d6c362c-c0b0-4a53-98a0-841d9762393c Related toml files f1641c49-d4df-4fbb-805f-f51608b82af2 Flux.1-dev-test-v3_20240822-182849.json My personal skills are too low, I hope you can help me solve it, thank you.

eftSharptooth commented 3 weeks ago

It was my mistake earlier, no worries. Hey, I missed it earlier, but it says accelerator device is cpu. Could you open a terminal or command prompt and paste the results when you run "nvidia-smi". It might be that your accelerator is not configured to use your gpu.

kohya-ss commented 3 weeks ago

The log says accelerator device: cpu. The accelerate config setting may be incorrect.

eftSharptooth commented 3 weeks ago

OK, so you will need to run an accelerate config in your venv. It would be good to also get your gpu info (single or multiple) and IDs if that has to be set via environmental variable. This is done with nvidia-smi in a command prompt window.

CNEA-lw commented 3 weeks ago

OK, so you will need to run an accelerate config in your venv. It would be good to also get your gpu info (single or multiple) and IDs if that has to be set via environmental variable. This is done with nvidia-smi in a command prompt window.

Thank you for your help. After I reset the accelerator device related issues on the startup page, I can indeed train. Thank you for your help!

CNEA-lw commented 3 weeks ago

The log says accelerator device: cpu. The accelerate config setting may be incorrect.

Thank you for your help. After I reset the accelerator device related issues on the startup page, I can indeed train. Thank you for your help!

CNEA-lw commented 3 weeks ago

If you encounter related problems, please 5. (Optional) Manually configure Accelerate after starting the setup Perform a reset. e76497fb-7f6d-4149-b4ee-a446c5c70832

eftSharptooth commented 3 weeks ago

Im glad you are working! Happy training and good luck!

8bit-boom commented 3 weeks ago

I have similar problem, already tried Manually configure Accelerate, but it didn't help.

config file: "LoRA_type": "Flux1", "LyCORIS_preset": "full", "adaptive_noise_scale": 0, "additional_parameters": "", "ae": "D:/SD/stable-diffusion-webui/models/VAE/ae.safetensors", "apply_t5_attn_mask": false, "async_upload": false, "block_alphas": "", "block_dims": "", "block_lr_zero_threshold": "", "bucket_no_upscale": false, "bucket_reso_steps": 64, "bypass_mode": false, "cache_latents": true, "cache_latents_to_disk": true, "caption_dropout_every_n_epochs": 0, "caption_dropout_rate": 0, "caption_extension": ".txt", "clip_l": "D:/SD/stable-diffusion-webui/models/Text Encoder/clip_l.safetensors", "clip_skip": 1, "color_aug": false, "constrain": 0, "conv_alpha": 1, "conv_block_alphas": "", "conv_block_dims": "", "conv_dim": 1, "dataset_config": "", "debiased_estimation_loss": false, "decompose_both": false, "dim_from_weights": false, "discrete_flow_shift": 1, "dora_wd": false, "down_lr_weight": "", "dynamo_backend": "no", "dynamo_mode": "default", "dynamo_use_dynamic": false, "dynamo_use_fullgraph": false, "enable_bucket": true, "epoch": 1, "extra_accelerate_launch_args": "", "factor": -1, "flip_aug": false, "flux1_cache_text_encoder_outputs": true, "flux1_cache_text_encoder_outputs_to_disk": true, "flux1_checkbox": true, "fp8_base": true, "full_bf16": false, "full_fp16": false, "gpu_ids": "", "gradient_accumulation_steps": 1, "gradient_checkpointing": true, "guidance_scale": 1, "highvram": true, "huber_c": 0.1, "huber_schedule": "snr", "huggingface_path_in_repo": "", "huggingface_repo_id": "", "huggingface_repo_type": "", "huggingface_repo_visibility": "", "huggingface_token": "", "ip_noise_gamma": 0, "ip_noise_gamma_random_strength": false, "keep_tokens": 0, "learning_rate": 0.0002, "log_config": false, "log_tracker_config": "", "log_tracker_name": "", "log_with": "", "logging_dir": "D:\SD\stable-diffusion-webui\Traning\drow\opt3\opt2\Chadra Ironwall\log", "loraplus_lr_ratio": 0, "loraplus_text_encoder_lr_ratio": 0, "loraplus_unet_lr_ratio": 0, "loss_type": "l2", "lowvram": false, "lr_scheduler": "linear", "lr_scheduler_args": "", "lr_scheduler_num_cycles": 1, "lr_scheduler_power": 1, "lr_scheduler_type": "", "lr_warmup": 0, "main_process_port": 0, "masked_loss": false, "max_bucket_reso": 2048, "max_data_loader_n_workers": 0, "max_grad_norm": 1, "max_resolution": "512,512", "max_timestep": 1000, "max_token_length": 75, "max_train_epochs": 0, "max_train_steps": 1000, "mem_eff_attn": false, "mem_eff_save": false, "metadata_author": "", "metadata_description": "", "metadata_license": "", "metadata_tags": "", "metadata_title": "", "mid_lr_weight": "", "min_bucket_reso": 256, "min_snr_gamma": 10, "min_timestep": 0, "mixed_precision": "bf16", "model_list": "custom", "model_prediction_type": "raw", "module_dropout": 0, "multi_gpu": false, "multires_noise_discount": 0.2, "multires_noise_iterations": 8, "network_alpha": 16, "network_dim": 16, "network_dropout": 0, "network_weights": "", "noise_offset": 0, "noise_offset_random_strength": false, "noise_offset_type": "Original", "num_cpu_threads_per_process": 2, "num_machines": 1, "num_processes": 1, "optimizer": "AdamW8bit", "optimizer_args": "", "output_dir": "D:\SD\stable-diffusion-webui\Traning\drow\opt3\opt2\Chadra Ironwall\model", "output_name": "Flux.sidney-sweeney-v1", "persistent_data_loader_workers": false, "pretrained_model_name_or_path": "D:/SD/stable-diffusion-webui/models/Stable-diffusion/flux_dev.safetensors", "prior_loss_weight": 1, "random_crop": false, "rank_dropout": 0, "rank_dropout_scale": false, "reg_data_dir": "", "rescaled": false, "resume": "", "resume_from_huggingface": "", "sample_every_n_epochs": 0, "sample_every_n_steps": 250, "sample_prompts": "a painting of a man wearing a funny hat, by darius kawasaki --w 832 --h 1260 --s 20 --l 3", "sample_sampler": "euler", "save_as_bool": false, "save_every_n_epochs": 1, "save_every_n_steps": 0, "save_last_n_steps": 0, "save_last_n_steps_state": 0, "save_model_as": "safetensors", "save_precision": "bf16", "save_state": false, "save_state_on_train_end": false, "save_state_to_huggingface": false, "scale_v_pred_loss_like_noise_pred": false, "scale_weight_norms": 0, "sdxl": false, "sdxl_cache_text_encoder_outputs": true, "sdxl_no_half_vae": true, "seed": 42, "shuffle_caption": false, "split_mode": false, "split_qkv": false, "stop_text_encoder_training": 0, "t5xxl": "D:/SD/stable-diffusion-webui/models/Text Encoder/t5xxl_fp16.safetensors", "t5xxl_max_token_length": 512, "text_encoder_lr": 0, "timestep_sampling": "sigmoid", "train_batch_size": 1, "train_blocks": "all", "train_data_dir": "D:/SD/stable-diffusion-webui/Traning/drow/opt3/opt2/Chadra Ironwall/image114", "train_norm": false, "train_on_input": true, "training_comment": "", "unet_lr": 0.0001, "unit": 1, "up_lr_weight": "", "use_cp": false, "use_scalar": false, "use_tucker": false, "v2": false, "v_parameterization": false, "v_pred_like_loss": 0, "vae": "", "vae_batch_size": 0, "wandb_api_key": "", "wandb_run_name": "", "weighted_captions": false, "xformers": "sdpa" }

kohya-ss commented 3 weeks ago

"pretrained_model_name_or_path": "D:/SD/stable-diffusion-webui/models/Stable-diffusion/flux_dev.safetensors",

Are you using the fp16 checkpoint? The fp8 checkpoint gives an error.

8bit-boom commented 3 weeks ago

"pretrained_model_name_or_path": "D:/SD/stable-diffusion-webui/models/Stable-diffusion/flux_dev.safetensors",

Are you using the fp16 checkpoint? The fp8 checkpoint gives an error.

I am using fp8, downloaded from civitai , they had fp32 an fp8, will try with fp16

8bit-boom commented 3 weeks ago

Its working now, thank you so much

AlexiosDyral commented 3 days ago

I have the same problem, but I set up my gpu and my fp16, what should I do

AlexiosDyral commented 2 days ago

bmaltais / kohya_ss

subprocess.CalledProcessError: Command '['E:\\kohya_ss\\venv\\Scripts\\python.exe', 'E:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', './test/output/config_lora-20240822-182849.toml']' returned non-zero exit status 3221225477. #2730