BUG: FluX error train stops since today update

kalle07 commented 1 month ago

100%|██████████████████████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00, 9.27it/s] 2024-09-06 10:53:11 INFO move vae and unet to cpu to save memory flux_train_network.py:208 INFO move text encoders to gpu flux_train_network.py:216 2024-09-06 10:53:15 INFO [Dataset 0] train_util.py:2347 INFO caching Text Encoder outputs with caching strategy. train_util.py:1107 INFO checking cache validity... train_util.py:1113 100%|██████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s] INFO caching Text Encoder outputs... train_util.py:1139 100%|██████████████████████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 14.31it/s] 2024-09-06 10:53:16 INFO cache Text Encoder outputs for sample prompt: flux_train_network.py:232 D:/jeri/model\sample/prompt.txt INFO cache Text Encoder outputs for prompt: JeriR07, portrait photo of flux_train_network.py:243 a 40yo woman smirk at viewer INFO cache Text Encoder outputs for prompt: flux_train_network.py:243 INFO move t5XXL back to cpu flux_train_network.py:256 2024-09-06 10:53:18 INFO move vae and unet back to original device flux_train_network.py:261 INFO create LoRA network. base dim (rank): 64, alpha: 64 lora.py:935 INFO neuron dropout: p=None, rank dropout: p=None, module dropout: p=None lora.py:936 INFO create LoRA for Text Encoder 1: lora.py:1027 INFO create LoRA for Text Encoder 2: lora.py:1027 INFO create LoRA for Text Encoder: 24 modules. lora.py:1035 INFO create LoRA for U-Net: 0 modules. lora.py:1043 Traceback (most recent call last): File "e:\kohya_ss\sd-scripts\flux_train_network.py", line 519, in trainer.train(args) File "e:\kohya_ss\sd-scripts\train_network.py", line 441, in train self.post_process_network(args, accelerator, network, text_encoders, unet) File "e:\kohya_ss\sd-scripts\flux_train_network.py", line 170, in post_process_network self.train_t5xxl = network.train_t5xxl File "e:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1729, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LoRANetwork' object has no attribute 'train_t5xxl' Traceback (most recent call last): File "C:\Users\kallemst.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\kallemst.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "e:\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in File "e:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "e:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "e:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['e:\kohya_ss\venv\Scripts\python.exe', 'e:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/jeri/model/config_lora-20240906-105258.toml']' returned non-zero exit status 1. 10:53:21-318087 INFO Training has ended.

bmaltais commented 1 month ago

I can't reproduce... no issue training on my side.

kalle07 commented 1 month ago

hmmm, any idea ?

except future warnings i get only that while i start the training, and at the end that error above

e:\kohya_ss\sd-scripts\library\flux_models.py:79: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdputils.cpp:555.) h = nn.functional.scaled_dot_product_attention(q, k, v)

... windows10 i deleted venv - create new iam on branch sd3-flux activate gui.bat is installed all

start is ok (venv) e:\kohya_ss>gui.bat 15:18:43-469740 INFO Kohya_ss GUI version: v24.2.0

15:18:45-189821 INFO Submodule initialized and updated. 15:18:45-199792 INFO nVidia toolkit detected 15:18:46-646699 INFO Torch 2.4.0+cu124 15:18:46-669550 INFO Torch backend: nVidia CUDA 12.4 cuDNN 90100 15:18:46-669550 INFO Torch detected GPU: NVIDIA GeForce RTX 4060 Ti VRAM 16380 Arch (8, 9) Cores 34 15:18:46-669550 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] 15:18:46-679512 INFO Verifying modules installation status from requirements_pytorch_windows.txt... 15:18:46-679512 INFO Verifying modules installation status from requirements_windows.txt... 15:18:46-679512 INFO Verifying modules installation status from requirements.txt... 15:18:53-769617 INFO headless: False 15:18:53-814661 INFO Using shell=True when running external commands... Running on local URL: http://127.0.0.1:7860

yggdrasil75 commented 1 month ago

venv/scripts/activate pip install -r requirements.txt

then check that its not something wrong with accelerate config by using python directly instead of accelerate. python sd-scripts/flux_train_network.py --config_file \<your toml path> if that works, then you have an issue with accelerate. make sure everything is correct. you cant use 2 different gpus that dont support the same current cuda version (m40 and 3090 for instance) and cant use tensorrt with a non-rt gpu even if there is an rt gpu installed (p40 and 3090 will not work if using tensorrt) there are a lot of other restictions for the other options as well.

kalle07 commented 1 month ago

hmm, but why yesterday all works fine :...(

yes i have installed req with activated env no warnings

and what doeas thats mean ?

(venv) e:\kohya_ss>python sd-scripts/flux_train_network.py --config_file e:\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( 2024-09-06 16:53:20.393471: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-06 16:53:21.024685: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. e:\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_fwd") e:\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_bwd") 2024-09-06 16:53:22 WARNING A matching Triton is not available, some optimizations will not be enabled init.py:61 Traceback (most recent call last): File "e:\kohya_ss\venv\lib\site-packages\xformers__init__.py", line 57, in _is_triton_available import triton # noqa ModuleNotFoundError: No module named 'triton' e:\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( usage: flux_train_network.py [-h] [--console_log_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--console_log_file CONSOLE_LOG_FILE] [--console_log_simple] [--v2] [--v_parameterization] [--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH] [--tokenizer_cache_dir TOKENIZER_CACHE_DIR] [--train_data_dir TRAIN_DATA_DIR] [--cache_info] [--shuffle_caption] [--caption_separator CAPTION_SEPARATOR] [--caption_extension CAPTION_EXTENSION] [--caption_extention CAPTION_EXTENTION] [--keep_tokens KEEP_TOKENS] [--keep_tokens_separator KEEP_TOKENS_SEPARATOR] [--secondary_separator SECONDARY_SEPARATOR] [--enable_wildcard] [--caption_prefix CAPTION_PREFIX] [--caption_suffix CAPTION_SUFFIX] [--color_aug] [--flip_aug] [--face_crop_aug_range FACE_CROP_AUG_RANGE] [--random_crop] [--debug_dataset] [--resolution RESOLUTION] [--cache_latents] [--vae_batch_size VAE_BATCH_SIZE] [--cache_latents_to_disk] [--enable_bucket] [--min_bucket_reso MIN_BUCKET_RESO] [--max_bucket_reso MAX_BUCKET_RESO] [--bucket_reso_steps BUCKET_RESO_STEPS] [--bucket_no_upscale] [--token_warmup_min TOKEN_WARMUP_MIN] [--token_warmup_step TOKEN_WARMUP_STEP] [--alpha_mask] [--dataset_class DATASET_CLASS] [--caption_dropout_rate CAPTION_DROPOUT_RATE] [--caption_dropout_every_n_epochs CAPTION_DROPOUT_EVERY_N_EPOCHS] [--caption_tag_dropout_rate CAPTION_TAG_DROPOUT_RATE] [--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON] [--dataset_repeats DATASET_REPEATS] [--output_dir OUTPUT_DIR] [--output_name OUTPUT_NAME] [--huggingface_repo_id HUGGINGFACE_REPO_ID] [--huggingface_repo_type HUGGINGFACE_REPO_TYPE] [--huggingface_path_in_repo HUGGINGFACE_PATH_IN_REPO] [--huggingface_token HUGGINGFACE_TOKEN] [--huggingface_repo_visibility HUGGINGFACE_REPO_VISIBILITY] [--save_state_to_huggingface] [--resume_from_huggingface] [--async_upload] [--save_precision {None,float,fp16,bf16}] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS] [--save_every_n_steps SAVE_EVERY_N_STEPS] [--save_n_epoch_ratio SAVE_N_EPOCH_RATIO] [--save_last_n_epochs SAVE_LAST_N_EPOCHS] [--save_last_n_epochs_state SAVE_LAST_N_EPOCHS_STATE] [--save_last_n_steps SAVE_LAST_N_STEPS] [--save_last_n_steps_state SAVE_LAST_N_STEPS_STATE] [--save_state] [--save_state_on_train_end] [--resume RESUME] [--train_batch_size TRAIN_BATCH_SIZE] [--max_token_length {None,150,225}] [--mem_eff_attn] [--torch_compile] [--dynamo_backend {eager,aot_eager,inductor,aot_ts_nvfuser,nvprims_nvfuser,cudagraphs,ofi,fx2trt,onnxrt}] [--xformers] [--sdpa] [--vae VAE] [--max_train_steps MAX_TRAIN_STEPS] [--max_train_epochs MAX_TRAIN_EPOCHS] [--max_data_loader_n_workers MAX_DATA_LOADER_N_WORKERS] [--persistent_data_loader_workers] [--seed SEED] [--gradient_checkpointing] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--mixed_precision {no,fp16,bf16}] [--full_fp16] [--full_bf16] [--fp8_base] [--ddp_timeout DDP_TIMEOUT] [--ddp_gradient_as_bucket_view] [--ddp_static_graph] [--clip_skip CLIP_SKIP] [--logging_dir LOGGING_DIR] [--log_with {tensorboard,wandb,all}] [--log_prefix LOG_PREFIX] [--log_tracker_name LOG_TRACKER_NAME] [--wandb_run_name WANDB_RUN_NAME] [--log_tracker_config LOG_TRACKER_CONFIG] [--wandb_api_key WANDB_API_KEY] [--log_config] [--noise_offset NOISE_OFFSET] [--noise_offset_random_strength] [--multires_noise_iterations MULTIRES_NOISE_ITERATIONS] [--ip_noise_gamma IP_NOISE_GAMMA] [--ip_noise_gamma_random_strength] [--multires_noise_discount MULTIRES_NOISE_DISCOUNT] [--adaptive_noise_scale ADAPTIVE_NOISE_SCALE] [--zero_terminal_snr] [--min_timestep MIN_TIMESTEP] [--max_timestep MAX_TIMESTEP] [--loss_type {l1,l2,huber,smooth_l1}] [--huber_schedule {constant,exponential,snr}] [--huber_c HUBER_C] [--lowram] [--highvram] [--sample_every_n_steps SAMPLE_EVERY_N_STEPS] [--sample_at_first] [--sample_every_n_epochs SAMPLE_EVERY_N_EPOCHS] [--sample_prompts SAMPLE_PROMPTS] [--sample_sampler {ddim,pndm,lms,euler,euler_a,heun,dpm_2,dpm_2_a,dpmsolver,dpmsolver++,dpmsingle,k_lms,k_euler,k_euler_a,k_dpm_2,k_dpm_2_a}] [--config_file CONFIG_FILE] [--output_config] [--metadata_title METADATA_TITLE] [--metadata_author METADATA_AUTHOR] [--metadata_description METADATA_DESCRIPTION] [--metadata_license METADATA_LICENSE] [--metadata_tags METADATA_TAGS] [--prior_loss_weight PRIOR_LOSS_WEIGHT] [--conditioning_data_dir CONDITIONING_DATA_DIR] [--masked_loss] [--deepspeed] [--zero_stage {0,1,2,3}] [--offload_optimizer_device {None,cpu,nvme}] [--offload_optimizer_nvme_path OFFLOAD_OPTIMIZER_NVME_PATH] [--offload_param_device {None,cpu,nvme}] [--offload_param_nvme_path OFFLOAD_PARAM_NVME_PATH] [--zero3_init_flag] [--zero3_save_16bit_model] [--fp16_master_weights_and_gradients] [--optimizer_type OPTIMIZER_TYPE] [--use_8bit_adam] [--use_lion_optimizer] [--learning_rate LEARNING_RATE] [--max_grad_norm MAX_GRAD_NORM] [--optimizer_args [OPTIMIZER_ARGS ...]] [--lr_scheduler_type LR_SCHEDULER_TYPE] [--lr_scheduler_args [LR_SCHEDULER_ARGS ...]] [--lr_scheduler LR_SCHEDULER] [--lr_warmup_steps LR_WARMUP_STEPS] [--lr_scheduler_num_cycles LR_SCHEDULER_NUM_CYCLES] [--lr_scheduler_power LR_SCHEDULER_POWER] [--fused_backward_pass] [--dataset_config DATASET_CONFIG] [--min_snr_gamma MIN_SNR_GAMMA] [--scale_v_pred_loss_like_noise_pred] [--v_pred_like_loss V_PRED_LIKE_LOSS] [--debiased_estimation_loss] [--weighted_captions] [--cpu_offload_checkpointing] [--no_metadata] [--save_model_as {None,ckpt,pt,safetensors}] [--unet_lr UNET_LR] [--text_encoder_lr TEXT_ENCODER_LR] [--fp8_base_unet] [--network_weights NETWORK_WEIGHTS] [--network_module NETWORK_MODULE] [--network_dim NETWORK_DIM] [--network_alpha NETWORK_ALPHA] [--network_dropout NETWORK_DROPOUT] [--network_args [NETWORK_ARGS ...]] [--network_train_unet_only] [--network_train_text_encoder_only] [--training_comment TRAINING_COMMENT] [--dim_from_weights] [--scale_weight_norms SCALE_WEIGHT_NORMS] [--base_weights [BASE_WEIGHTS ...]] [--base_weights_multiplier [BASE_WEIGHTS_MULTIPLIER ...]] [--no_half_vae] [--skip_until_initial_step] [--initial_epoch INITIAL_EPOCH] [--initial_step INITIAL_STEP] [--clip_l CLIP_L] [--t5xxl T5XXL] [--ae AE] [--t5xxl_max_token_length T5XXL_MAX_TOKEN_LENGTH] [--apply_t5_attn_mask] [--cache_text_encoder_outputs] [--cache_text_encoder_outputs_to_disk] [--text_encoder_batch_size TEXT_ENCODER_BATCH_SIZE] [--disable_mmap_load_safetensors] [--weighting_scheme {sigma_sqrt,logit_normal,mode,cosmap,none}] [--logit_mean LOGIT_MEAN] [--logit_std LOGIT_STD] [--mode_scale MODE_SCALE] [--guidance_scale GUIDANCE_SCALE] [--timestep_sampling {sigma,uniform,sigmoid,shift,flux_shift}] [--sigmoid_scale SIGMOID_SCALE] [--model_prediction_type {raw,additive,sigma_scaled}] [--discrete_flow_shift DISCRETE_FLOW_SHIFT] [--split_mode] flux_train_network.py: error: argument --config_file: expected one argument

(venv) e:\kohya_ss>

yggdrasil75 commented 1 month ago

gotta have your config file in there. for instance, my full command is python sd-scripts/flux_train_network.py --config_file /dataset/kohya/model/config_lora-20240904-083735.toml

didnt realize that github md decided that triangle brackets have to be hidden cause thats html tagging. fixed the previous post.

kalle07 commented 1 month ago

.. why does it go yesterday ;,,,(

(venv) e:\kohya_ss>python sd-scripts/flux_train_network.py --config_file d:\jeri\model\config_lora-20240906-153629.toml e:\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( 2024-09-06 19:03:28.583670: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-09-06 19:03:29.210350: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. e:\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_fwd") e:\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_bwd") 2024-09-06 19:03:30 WARNING A matching Triton is not available, some optimizations will not be enabled init.py:61 Traceback (most recent call last): File "e:\kohya_ss\venv\lib\site-packages\xformers__init__.py", line 57, in _is_triton_available import triton # noqa ModuleNotFoundError: No module named 'triton' e:\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( 2024-09-06 19:03:31 INFO Loading settings from d:\jeri\model\config_lora-20240906-153629.toml... train_util.py:4189 INFO d:\jeri\model\config_lora-20240906-153629 train_util.py:4208 2024-09-06 19:03:31 INFO t5xxl_max_token_length: 150 flux_train_network.py:155 e:\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 INFO Using DreamBooth method. train_network.py:291 INFO prepare images. train_util.py:1803 INFO get image size from name of cache files train_util.py:1741 100%|████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 4135.54it/s] INFO set image size from cache files: 19/19 train_util.py:1748 INFO found directory D:\jeri\images\001_photo of a woman contains 19 image train_util.py:1750 files WARNING No caption file found for 3 images. Training will continue without train_util.py:1781 captions for these images. If class token exists, it will be used. / 3枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 WARNING D:\jeri\images\001_photo of a train_util.py:1788 woman\image(483)-topaz-faceai-enhance-2x.png WARNING D:\jeri\images\001_photo of a train_util.py:1788 woman\image(484)-topaz-faceai-enhance-2x.png WARNING D:\jeri\images\001_photo of a train_util.py:1788 woman\image(487)-topaz-faceai-enhance-2x.png INFO 19 train images with repeating. train_util.py:1844 INFO 0 reg images. train_util.py:1847 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1852 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (512, 512) enable_bucket: False network_multiplier: 1.0

                           [Subset 0 of Dataset 0]
                             image_dir: "D:\jeri\images\001_photo of a woman"
                             image_count: 19
                             num_repeats: 1
                             shuffle_caption: False
                             keep_tokens: 0
                             keep_tokens_separator:
                             caption_separator: ,
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             alpha_mask: False,
                             is_reg: False
                             class_tokens: photo of a woman
                             caption_extension: .txt

                INFO     [Dataset 0]                                                              config_util.py:576
                INFO     loading image sizes.                                                      train_util.py:876

100%|██████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s] INFO prepare dataset train_util.py:884 INFO preparing accelerator train_network.py:345 accelerator device: cuda INFO Building Flux model dev flux_utils.py:45 2024-09-06 19:03:32 INFO Loading state dict from flux_utils.py:52 E:/WebUI_Forge/webui/models/Stable-diffusion/flux1-dev.safetensors INFO Loaded Flux: flux_utils.py:55 INFO Building CLIP flux_utils.py:74 INFO Loading state dict from flux_utils.py:167 E:/WebUI_Forge/webui/models/VAE/clip_l.safetensors INFO Loaded CLIP: flux_utils.py:170 INFO Loading state dict from flux_utils.py:215 E:/WebUI_Forge/webui/models/VAE/t5xxl_fp16.safetensors INFO Loaded T5xxl: flux_utils.py:218 INFO Building AutoEncoder flux_utils.py:62 INFO Loading state dict from E:/WebUI_Forge/webui/models/VAE/ae.safetensors flux_utils.py:66 INFO Loaded AE: flux_utils.py:69 import network module: networks.lora INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|██████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s] INFO no latents to cache train_util.py:1034 INFO move vae and unet to cpu to save memory flux_train_network.py:208 INFO move text encoders to gpu flux_train_network.py:216 2024-09-06 19:03:36 INFO [Dataset 0] train_util.py:2347 INFO caching Text Encoder outputs with caching strategy. train_util.py:1107 INFO checking cache validity... train_util.py:1113 100%|██████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s] INFO no Text Encoder outputs to cache train_util.py:1135 INFO cache Text Encoder outputs for sample prompt: flux_train_network.py:232 D:/jeri/model\sample/prompt.txt INFO cache Text Encoder outputs for prompt: JeriR07, portrait photo of flux_train_network.py:243 a 40yo woman smirk at viewer e:\kohya_ss\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) attn_output = torch.nn.functional.scaled_dot_product_attention( INFO cache Text Encoder outputs for prompt: flux_train_network.py:243 INFO move t5XXL back to cpu flux_train_network.py:256 2024-09-06 19:03:38 INFO move vae and unet back to original device flux_train_network.py:261 INFO create LoRA network. base dim (rank): 64, alpha: 64 lora.py:935 INFO neuron dropout: p=None, rank dropout: p=None, module dropout: p=None lora.py:936 INFO create LoRA for Text Encoder 1: lora.py:1027 INFO create LoRA for Text Encoder 2: lora.py:1027 INFO create LoRA for Text Encoder: 24 modules. lora.py:1035 INFO create LoRA for U-Net: 0 modules. lora.py:1043 Traceback (most recent call last): File "e:\kohya_ss\sd-scripts\flux_train_network.py", line 519, in trainer.train(args) File "e:\kohya_ss\sd-scripts\train_network.py", line 441, in train self.post_process_network(args, accelerator, network, text_encoders, unet) File "e:\kohya_ss\sd-scripts\flux_train_network.py", line 170, in post_process_network self.train_t5xxl = network.train_t5xxl File "e:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1729, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LoRANetwork' object has no attribute 'train_t5xxl'

(venv) e:\kohya_ss>

bmaltais commented 1 month ago

Kohya has updated the sd-scripts and added support for train_t5xxl... it is almost as if you do not have the latest sd-scripts code... Try this in the kohya_ss gui folder:

cd sd-scripts
git checkout sd3
git pull
cd ..

This should force the pull of the latest code...

kalle07 commented 1 month ago

but i did not choose any option for t5xxl

grafik

again ... INFO caching Text Encoder outputs... train_util.py:1139 100%|██████████████████████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.05it/s] 2024-09-07 20:01:15 INFO cache Text Encoder outputs for sample prompt: flux_train_network.py:232 D:/jeri/model\sample/prompt.txt INFO cache Text Encoder outputs for prompt: JeriR07, portrait photo of flux_train_network.py:243 a 40yo woman smirk at viewer INFO cache Text Encoder outputs for prompt: flux_train_network.py:243 2024-09-07 20:01:16 INFO move t5XXL back to cpu flux_train_network.py:256 2024-09-07 20:01:18 INFO move vae and unet back to original device flux_train_network.py:261 INFO create LoRA network. base dim (rank): 64, alpha: 64 lora.py:935 INFO neuron dropout: p=None, rank dropout: p=None, module dropout: p=None lora.py:936 INFO create LoRA for Text Encoder 1: lora.py:1027 INFO create LoRA for Text Encoder 2: lora.py:1027 INFO create LoRA for Text Encoder: 24 modules. lora.py:1035 INFO create LoRA for U-Net: 0 modules. lora.py:1043 Traceback (most recent call last): File "e:\kohya_ss\sd-scripts\flux_train_network.py", line 519, in trainer.train(args) File "e:\kohya_ss\sd-scripts\train_network.py", line 441, in train self.post_process_network(args, accelerator, network, text_encoders, unet) File "e:\kohya_ss\sd-scripts\flux_train_network.py", line 170, in post_process_network self.train_t5xxl = network.train_t5xxl File "e:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1729, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'LoRANetwork' object has no attribute 'train_t5xxl' Traceback (most recent call last): File "C:\Users\kallemst.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\kallemst.pyenv\pyenv-win\versions\3.10.11\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "e:\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in File "e:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "e:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "e:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['e:\kohya_ss\venv\Scripts\python.exe', 'e:/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/jeri/model/config_lora-20240907-200048.toml']' returned non-zero exit status 1. 20:01:20-118484 INFO Training has ended.

OMG it is realy alpha state , should i wait one month? ;)

pouletmou commented 1 month ago

Same problem for me, with the same conf. I tried on Windows and Linux. I did update sd-scripts (sd3 branch) and it tells me that i'm already up to date.

kalle07 commented 1 month ago

seems to start now (update today) now must check if it realy trains ....

pouletmou commented 1 month ago

Yup, works again !

kalle07 commented 1 month ago

it trains ... on rtx4060 (16GB) 4s/it ... (but i reduced train resolution down to 512), will see if its still okay

bmaltais / kohya_ss

BUG: FluX error train stops since today update #2789