LarryJane491 / Lora-Training-in-Comfy

This custom node lets you train LoRA directly in ComfyUI!
364 stars 50 forks source link

Lora training fails - "returned non-zero exit status 2", seems to not recognise the checkpoint #55

Open ArmouryGaming opened 3 months ago

ArmouryGaming commented 3 months ago

Hi,

I have used the captioning nodes and they worked fine, but when I try to run the lora node, I get the below issue. There seems to be an issue with getting it to recognise the checkpoint. From what I can see if goes wrong here:

train_network.py: error: unrecognized arguments: UI\models\checkpoints\perfectWorld_v6Baked.safetensors

I have tried it with a few different screenshots, same issue each time.

Full console log is below. As is screenshot of the settings.

Screenshot 2024-06-23 233128


got prompt D:\Python\ComfyUI\Comfy UI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. usage: train_network.py [-h] [--v2] [--v_parameterization] [--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH] [--tokenizer_cache_dir TOKENIZER_CACHE_DIR] [--train_data_dir TRAIN_DATA_DIR] [--shuffle_caption] [--caption_separator CAPTION_SEPARATOR] [--caption_extension CAPTION_EXTENSION] [--caption_extention CAPTION_EXTENTION] [--keep_tokens KEEP_TOKENS] [--caption_prefix CAPTION_PREFIX] [--caption_suffix CAPTION_SUFFIX] [--color_aug] [--flip_aug] [--face_crop_aug_range FACE_CROP_AUG_RANGE] [--random_crop] [--debug_dataset] [--resolution RESOLUTION] [--cache_latents] [--vae_batch_size VAE_BATCH_SIZE] [--cache_latents_to_disk] [--enable_bucket] [--min_bucket_reso MIN_BUCKET_RESO] [--max_bucket_reso MAX_BUCKET_RESO] [--bucket_reso_steps BUCKET_RESO_STEPS] [--bucket_no_upscale] [--token_warmup_min TOKEN_WARMUP_MIN] [--token_warmup_step TOKEN_WARMUP_STEP] [--dataset_class DATASET_CLASS] [--caption_dropout_rate CAPTION_DROPOUT_RATE] [--caption_dropout_every_n_epochs CAPTION_DROPOUT_EVERY_N_EPOCHS] [--caption_tag_dropout_rate CAPTION_TAG_DROPOUT_RATE] [--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON] [--dataset_repeats DATASET_REPEATS] [--output_dir OUTPUT_DIR] [--output_name OUTPUT_NAME] [--huggingface_repo_id HUGGINGFACE_REPO_ID] [--huggingface_repo_type HUGGINGFACE_REPO_TYPE] [--huggingface_path_in_repo HUGGINGFACE_PATH_IN_REPO] [--huggingface_token HUGGINGFACE_TOKEN] [--huggingface_repo_visibility HUGGINGFACE_REPO_VISIBILITY] [--save_state_to_huggingface] [--resume_from_huggingface] [--async_upload] [--save_precision {None,float,fp16,bf16}] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS] [--save_every_n_steps SAVE_EVERY_N_STEPS] [--save_n_epoch_ratio SAVE_N_EPOCH_RATIO] [--save_last_n_epochs SAVE_LAST_N_EPOCHS] [--save_last_n_epochs_state SAVE_LAST_N_EPOCHS_STATE] [--save_last_n_steps SAVE_LAST_N_STEPS] [--save_last_n_steps_state SAVE_LAST_N_STEPS_STATE] [--save_state] [--resume RESUME] [--train_batch_size TRAIN_BATCH_SIZE] [--max_token_length {None,150,225}] [--mem_eff_attn] [--xformers] [--sdpa] [--vae VAE] [--max_train_steps MAX_TRAIN_STEPS] [--max_train_epochs MAX_TRAIN_EPOCHS] [--max_data_loader_n_workers MAX_DATA_LOADER_N_WORKERS] [--persistent_data_loader_workers] [--seed SEED] [--gradient_checkpointing] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--mixed_precision {no,fp16,bf16}] [--full_fp16] [--full_bf16] [--ddp_timeout DDP_TIMEOUT] [--clip_skip CLIP_SKIP] [--logging_dir LOGGING_DIR] [--log_with {tensorboard,wandb,all}] [--log_prefix LOG_PREFIX] [--log_tracker_name LOG_TRACKER_NAME] [--log_tracker_config LOG_TRACKER_CONFIG] [--wandb_api_key WANDB_API_KEY] [--noise_offset NOISE_OFFSET] [--multires_noise_iterations MULTIRES_NOISE_ITERATIONS] [--ip_noise_gamma IP_NOISE_GAMMA] [--multires_noise_discount MULTIRES_NOISE_DISCOUNT] [--adaptive_noise_scale ADAPTIVE_NOISE_SCALE] [--zero_terminal_snr] [--min_timestep MIN_TIMESTEP] [--max_timestep MAX_TIMESTEP] [--lowram] [--sample_every_n_steps SAMPLE_EVERY_N_STEPS] [--sample_every_n_epochs SAMPLE_EVERY_N_EPOCHS] [--sample_prompts SAMPLE_PROMPTS] [--sample_sampler {ddim,pndm,lms,euler,euler_a,heun,dpm_2,dpm_2_a,dpmsolver,dpmsolver++,dpmsingle,k_lms,k_euler,k_euler_a,k_dpm_2,k_dpm_2_a}] [--config_file CONFIG_FILE] [--output_config] [--metadata_title METADATA_TITLE] [--metadata_author METADATA_AUTHOR] [--metadata_description METADATA_DESCRIPTION] [--metadata_license METADATA_LICENSE] [--metadata_tags METADATA_TAGS] [--prior_loss_weight PRIOR_LOSS_WEIGHT] [--optimizer_type OPTIMIZER_TYPE] [--use_8bit_adam] [--use_lion_optimizer] [--learning_rate LEARNING_RATE] [--max_grad_norm MAX_GRAD_NORM] [--optimizer_args [OPTIMIZER_ARGS ...]] [--lr_scheduler_type LR_SCHEDULER_TYPE] [--lr_scheduler_args [LR_SCHEDULER_ARGS ...]] [--lr_scheduler LR_SCHEDULER] [--lr_warmup_steps LR_WARMUP_STEPS] [--lr_scheduler_num_cycles LR_SCHEDULER_NUM_CYCLES] [--lr_scheduler_power LR_SCHEDULER_POWER] [--dataset_config DATASET_CONFIG] [--min_snr_gamma MIN_SNR_GAMMA] [--scale_v_pred_loss_like_noise_pred] [--v_pred_like_loss V_PRED_LIKE_LOSS] [--debiased_estimation_loss] [--weighted_captions] [--no_metadata] [--save_model_as {None,ckpt,pt,safetensors}] [--unet_lr UNET_LR] [--text_encoder_lr TEXT_ENCODER_LR] [--network_weights NETWORK_WEIGHTS] [--network_module NETWORK_MODULE] [--network_dim NETWORK_DIM] [--network_alpha NETWORK_ALPHA] [--network_dropout NETWORK_DROPOUT] [--network_args [NETWORK_ARGS ...]] [--network_train_unet_only] [--network_train_text_encoder_only] [--training_comment TRAINING_COMMENT] [--dim_from_weights] [--scale_weight_norms SCALE_WEIGHT_NORMS] [--base_weights [BASE_WEIGHTS ...]] [--base_weights_multiplier [BASE_WEIGHTS_MULTIPLIER ...]] [--no_half_vae] train_network.py: error: unrecognized arguments: UI\models\checkpoints\perfectWorld_v6Baked.safetensors Traceback (most recent call last): File "C:\Users\Shout\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Shout\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Python\ComfyUI\lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "D:\Python\ComfyUI\lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "D:\Python\ComfyUI\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "D:\Python\ComfyUI\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\Python\ComfyUI\Scripts\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=D:\Python\ComfyUI\Comfy', 'UI\models\checkpoints\perfectWorld_v6Baked.safetensors', '--train_data_dir=D:/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=Demo_v1', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=Demo_v1', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=2', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--v2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 2. Train finished Prompt executed in 6.73 seconds