cocktailpeanut / fluxgym

Dead simple FLUX LoRA training UI with LOW VRAM support
MIT License
1.4k stars 121 forks source link

[ERROR] Command exited with code 1 #36

Open op7418 opened 2 months ago

op7418 commented 2 months ago

[2024-09-09 20:59:06] [INFO] Running F:\fluxgym\fluxgym\train.bat [2024-09-09 20:59:06] [INFO] [2024-09-09 20:59:06] [INFO] F:\fluxgym\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "F:\fluxgym\fluxgym\models\unet\flux1-dev.sft" --clip_l "F:\fluxgym\fluxgym\models\clip\clip_l.safetensors" --t5xxl "F:\fluxgym\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "F:\fluxgym\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 16 --save_every_n_epochs 4 --dataset_config "F:\fluxgym\fluxgym\dataset.toml" --output_dir "F:\fluxgym\fluxgym\outputs" --output_name dark-fantasy --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 [2024-09-09 20:59:06] [INFO] 'accelerate' �����ڲ����ⲿ���Ҳ���ǿ����еij��� [2024-09-09 20:59:06] [INFO] ���������ļ��� [2024-09-09 20:59:06] [ERROR] Command exited with code 1 [2024-09-09 20:59:06] [INFO] Runner:

gohan2091 commented 2 months ago

I am also getting the same error. I tought it was because the config said ae.sft where I had ae.safetensors so I changed that and it progressed further (with a different error) but now changing that ae file name no longer helps. I am getting exited with code 1.

Nde917 commented 2 months ago

same error, but it worked 2 days ago, i didnt changed anything.

iwoolf commented 2 months ago

I have the same error on Ubuntu. I have a 3060. [2024-09-12 10:20:59] [INFO] subprocess.CalledProcessError: Command '['/media/iwoolf/tenT/fluxgym/env/bin/python', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', '/media/iwoolf/tenT/fluxgym/models/unet/flux1-dev.sft', '--clip_l', '/media/iwoolf/tenT/fluxgym/models/clip/clip_l.safetensors', '--t5xxl', '/media/iwoolf/tenT/fluxgym/models/clip/t5xxl_fp16.safetensors', '--ae', '/media/iwoolf/tenT/fluxgym/models/vae/ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', '/media/iwoolf/tenT/fluxgym/dataset.toml', '--output_dir', '/media/iwoolf/tenT/fluxgym/outputs', '--output_name', 'flux-venusflytrap', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-12 10:21:00] [ERROR] Command exited with code 1

XT-404 commented 2 months ago

@cocktailpeanut same error message since update

in addition I have this in the console which appears constantly for no reason

[2024-09-12 08:43:05] [INFO] Running F:\fluxgym\train.bat [2024-09-12 08:43:05] [INFO] [2024-09-12 08:43:05] [INFO] (env) F:\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "F:\fluxgym\models\unet\flux1-dev.sft" --clip_l "F:\fluxgym\models\clip\clip_l.safetensors" --t5xxl "F:\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "F:\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 16 --optimizer_type adamw8bit --sample_prompts="F:\fluxgym\sample_prompts.txt" --sample_every_n_steps="150" --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 15 --save_every_n_epochs 1 --dataset_config "F:\fluxgym\dataset.toml" --output_dir "F:\fluxgym\outputs" --output_name mika1 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 [2024-09-12 08:43:08] [INFO] The following values were not passed to accelerate launch and had defaults used instead: [2024-09-12 08:43:08] [INFO] --num_processes was set to a value of 1 [2024-09-12 08:43:08] [INFO] --num_machines was set to a value of 1 [2024-09-12 08:43:08] [INFO] --dynamo_backend was set to a value of 'no' [2024-09-12 08:43:08] [INFO] To avoid this warning pass in values for each of the problematic parameters or run accelerate config. [2024-09-12 08:43:12] [INFO] highvram is enabled / highvramが有効です [2024-09-12 08:43:12] [INFO] 2024-09-12 08:43:12 WARNING cache_latents_to_disk is train_util.py:3896 [2024-09-12 08:43:12] [INFO] enabled, so cache_latents is [2024-09-12 08:43:12] [INFO] also enabled / [2024-09-12 08:43:12] [INFO] cache_latents_to_diskが有効なた [2024-09-12 08:43:12] [INFO] め、cache_latentsを有効にします [2024-09-12 08:43:12] [INFO] 2024-09-12 08:43:12 INFO t5xxl_max_token_length: flux_train_network.py:155 [2024-09-12 08:43:12] [INFO] 512 [2024-09-12 08:43:13] [INFO] F:\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 [2024-09-12 08:43:13] [INFO] warnings.warn( [2024-09-12 08:43:13] [INFO] You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 [2024-09-12 08:43:13] [INFO] 2024-09-12 08:43:13 INFO Loading dataset config from train_network.py:280 [2024-09-12 08:43:13] [INFO] F:\fluxgym\dataset.toml [2024-09-12 08:43:13] [INFO] INFO prepare images. train_util.py:1803 [2024-09-12 08:43:13] [INFO] INFO get image size from name of train_util.py:1741 [2024-09-12 08:43:13] [INFO] cache files [2024-09-12 08:43:13] [INFO] 0%| | 0/70 [00:00<?, ?it/s] 100%|██████████| 70/70 [00:00<00:00, 5772.62it/s] [2024-09-12 08:43:13] [INFO] INFO set image size from cache train_util.py:1748 [2024-09-12 08:43:13] [INFO] files: 0/70 [2024-09-12 08:43:13] [INFO] INFO found directory train_util.py:1750 [2024-09-12 08:43:13] [INFO] F:\fluxgym\datasets\mika1 [2024-09-12 08:43:13] [INFO] contains 70 image files [2024-09-12 08:43:13] [INFO] ERROR illegal char in file (not trainutil.py:1698 [2024-09-12 08:43:13] [INFO] UTF-8) / [2024-09-12 08:43:13] [INFO] ファイルにUTF-8以外の文字があり [2024-09-12 08:43:13] [INFO] ます: [2024-09-12 08:43:13] [INFO] F:\fluxgym\datasets\mika1\Mika [2024-09-12 08:43:13] [INFO] 7.txt [2024-09-12 08:43:13] [INFO] Traceback (most recent call last): [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\flux_train_network.py", line 519, in [2024-09-12 08:43:13] [INFO] trainer.train(args) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\train_network.py", line 317, in train [2024-09-12 08:43:13] [INFO] train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\library\config_util.py", line 485, in generate_dataset_group_by_blueprint [2024-09-12 08:43:13] [INFO] dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params)) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\library\train_util.py", line 1820, in init [2024-09-12 08:43:13] [INFO] img_paths, captions, sizes = load_dreambooth_dir(subset) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\library\train_util.py", line 1760, in load_dreambooth_dir [2024-09-12 08:43:13] [INFO] cap_for_img = read_caption(img_path, subset.caption_extension, subset.enable_wildcard) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\library\train_util.py", line 1699, in read_caption [2024-09-12 08:43:13] [INFO] raise e [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\library\train_util.py", line 1696, in read_caption [2024-09-12 08:43:13] [INFO] lines = f.readlines() [2024-09-12 08:43:13] [INFO] File "C:\Users\XT404\AppData\Local\Programs\Python\Python310\lib\codecs.py", line 322, in decode [2024-09-12 08:43:13] [INFO] (result, consumed) = self._buffer_decode(data, self.errors, final) [2024-09-12 08:43:13] [INFO] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 225: invalid continuation byte [2024-09-12 08:43:13] [INFO] Traceback (most recent call last): [2024-09-12 08:43:13] [INFO] File "C:\Users\XT404\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main [2024-09-12 08:43:13] [INFO] return _run_code(code, main_globals, None, [2024-09-12 08:43:13] [INFO] File "C:\Users\XT404\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code [2024-09-12 08:43:13] [INFO] exec(code, run_globals) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\env\Scripts\accelerate.exe__main__.py", line 7, in [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2024-09-12 08:43:13] [INFO] args.func(args) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command [2024-09-12 08:43:13] [INFO] simple_launcher(args) [2024-09-12 08:43:13] [INFO] File "F:\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher [2024-09-12 08:43:13] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2024-09-12 08:43:13] [INFO] subprocess.CalledProcessError: Command '['F:\fluxgym\env\Scripts\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'F:\fluxgym\models\unet\flux1-dev.sft', '--clip_l', 'F:\fluxgym\models\clip\clip_l.safetensors', '--t5xxl', 'F:\fluxgym\models\clip\t5xxl_fp16.safetensors', '--ae', 'F:\fluxgym\models\vae\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '16', '--optimizer_type', 'adamw8bit', '--sample_prompts=F:\fluxgym\sample_prompts.txt', '--sample_every_n_steps=150', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '15', '--save_every_n_epochs', '1', '--dataset_config', 'F:\fluxgym\dataset.toml', '--output_dir', 'F:\fluxgym\outputs', '--output_name', 'mika1', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-12 08:43:14] [ERROR] Command exited with code 1 [2024-09-12 08:43:14] [INFO] Runner:

image File=[] ????

cerarslan commented 2 months ago

same

XT-404 commented 2 months ago

So I understood where the code 1 error comes from. If the dataset is not normalized via a specific resolution: 512 / 768 / 1024 the code crashes automatically, it seems that it does not support multi format resolution.

I Normalized all the images and the error code disappeared.

it would be good to add the possibility to the code of supporting multi resolution format

cerarslan commented 2 months ago

So I understood where the code 1 error comes from. If the dataset is not normalized via a specific resolution: 512 / 768 / 1024 the code crashes automatically, it seems that it does not support multi format resolution.

I Normalized all the images and the error code disappeared.

it would be good to add the possibility to the code of supporting multi resolution format

What do you mean when you say 'normalized' sir?

XT-404 commented 2 months ago

@cerarslan So when I say Normalization I mean that all of the images must be in exactly the same resolution format:

512X512 / 768X768 or 1024X1024

if an image is not in this type of resolution format the code crashes and indicates error 1

cerarslan commented 2 months ago

@cerarslan So when I say Normalization I mean that all of the images must be in exactly the same resolution format:

512X512 / 768X768 or 1024X1024

if an image is not in this type of resolution format the code crashes and indicates error 1

I just tried both 512x512 and 1024x1024 and the result did not change, I still get the same error. :/

XT-404 commented 2 months ago

@cerarslan did you modify your entire silk image to 512 or 1024? you must not mix, moreover you must indicate in advanced the resolution used below

image

cerarslan commented 2 months ago

@cerarslan did you modify your entire silk image to 512 or 1024? you must not mix, moreover you must indicate in advanced the resolution used below

image

No, I only tried 512x512 or 1024x1024, I did not mix the two resolutions, I tried all kinds of combinations but the result is the same. I think there is a problem in 'dataset.toml', is it possible to share the parameters in this file? Mine is like this;

[general] shuffle_caption = false caption_extension = '.txt' keep_tokens = 1

[[datasets]] resolution = 512 batch_size = 1 keep_tokens = 1

[[datasets.subsets]] image_dir = 'Z:\pinokio\api\fluxgym.git\datasets\ohwx-man' class_tokens = 'ohwx man' num_repeats = 10

I guess not all settings can be fully routed to this file.

XT-404 commented 2 months ago

@cerarslan

accelerate launch ^ --mixed_precision bf16 ^ --num_cpu_threads_per_process 1 ^ sd-scripts/flux_train_network.py ^ --pretrained_model_name_or_path "F:\fluxgym\models\unet\flux1-dev.sft" ^ --clip_l "F:\fluxgym\models\clip\clip_l.safetensors" ^ --t5xxl "F:\fluxgym\models\clip\t5xxl_fp16.safetensors" ^ --ae "F:\fluxgym\models\vae\ae.sft" ^ --cache_latents_to_disk ^ --save_model_as safetensors ^ --sdpa --persistent_data_loader_workers ^ --max_data_loader_n_workers 2 ^ --seed 42 ^ --gradient_checkpointing ^ --mixed_precision bf16 ^ --save_precision bf16 ^ --network_module networks.lora_flux ^ --network_dim 4 ^ --optimizer_type adamw8bit ^--sample_prompts="F:\fluxgym\sample_prompts.txt" --sample_every_n_steps="250" ^ --learning_rate 8e-4 ^ --cache_text_encoder_outputs ^ --cache_text_encoder_outputs_to_disk ^ --fp8_base ^ --highvram ^ --max_train_epochs 10 ^ --save_every_n_epochs 1 ^ --dataset_config "F:\fluxgym\dataset.toml" ^ --output_dir "F:\fluxgym\outputs" ^ --output_name mp ^ --timestep_sampling shift ^ --discrete_flow_shift 3.1582 ^ --model_prediction_type raw ^ --guidance_scale 1 ^ --loss_type l2 ^

[general] shuffle_caption = false caption_extension = '.txt' keep_tokens = 1

[[datasets]] resolution = 768 batch_size = 1 keep_tokens = 1

[[datasets.subsets]] image_dir = 'F:\fluxgym\datasets\mp' class_tokens = 'Mika' num_repeats = 4

cerarslan commented 2 months ago

@cerarslan

accelerate launch ^ --mixed_precision bf16 ^ --num_cpu_threads_per_process 1 ^ sd-scripts/flux_train_network.py ^ --pretrained_model_name_or_path "F:\fluxgym\models\unet\flux1-dev.sft" ^ --clip_l "F:\fluxgym\models\clip\clip_l.safetensors" ^ --t5xxl "F:\fluxgym\models\clip\t5xxl_fp16.safetensors" ^ --ae "F:\fluxgym\models\vae\ae.sft" ^ --cache_latents_to_disk ^ --save_model_as safetensors ^ --sdpa --persistent_data_loader_workers ^ --max_data_loader_n_workers 2 ^ --seed 42 ^ --gradient_checkpointing ^ --mixed_precision bf16 ^ --save_precision bf16 ^ --network_module networks.lora_flux ^ --network_dim 4 ^ --optimizer_type adamw8bit ^--sample_prompts="F:\fluxgym\sample_prompts.txt" --sample_every_n_steps="250" ^ --learning_rate 8e-4 ^ --cache_text_encoder_outputs ^ --cache_text_encoder_outputs_to_disk ^ --fp8_base ^ --highvram ^ --max_train_epochs 10 ^ --save_every_n_epochs 1 ^ --dataset_config "F:\fluxgym\dataset.toml" ^ --output_dir "F:\fluxgym\outputs" ^ --output_name mp ^ --timestep_sampling shift ^ --discrete_flow_shift 3.1582 ^ --model_prediction_type raw ^ --guidance_scale 1 ^ --loss_type l2 ^

[general] shuffle_caption = false caption_extension = '.txt' keep_tokens = 1

[[datasets]] resolution = 768 batch_size = 1 keep_tokens = 1

[[datasets.subsets]] image_dir = 'F:\fluxgym\datasets\mp' class_tokens = 'Mika' num_repeats = 4

Yes, this is the problem, when I edit the .toml file in this way and try to run it, it reverts back to its old state. (the parameters at the top are deleted).

Johnny5-input commented 2 months ago

Renaming ae.safetensors to ae.sft within the vae directory resolved the issue and the system began functioning as expected.

cerarslan commented 2 months ago

Renaming ae.safetensors to ae.sft within the vae directory resolved the issue and the system began functioning as expected.

Mine already ae.sft :/

cerarslan commented 2 months ago

Here last part of my log:

[2024-09-12 14:43:34] [INFO] subprocess.CalledProcessError: Command '['Z:\pinokio\api\fluxgym.git\env\Scripts\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'Z:\pinokio\api\fluxgym.git\models\unet\flux1-dev.sft', '--clip_l', 'Z:\pinokio\api\fluxgym.git\models\clip\clip_l.safetensors', '--t5xxl', 'Z:\pinokio\api\fluxgym.git\models\clip\t5xxl_fp16.safetensors', '--ae', 'Z:\pinokio\api\fluxgym.git\models\vae\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--sample_prompts=Z:\pinokio\api\fluxgym.git\sample_prompts.txt', '--sample_every_n_steps=250', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', 'Z:\pinokio\api\fluxgym.git\dataset.toml', '--output_dir', 'Z:\pinokio\api\fluxgym.git\outputs', '--output_name', 'ohwx', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-12 14:43:34] [ERROR] Command exited with code 1 [2024-09-12 14:43:34] [INFO] Runner:

iwoolf commented 2 months ago

I just used birme.net to resize all my photos to 768. I updated, and tried again, with the same error message:

[2024-09-12 22:28:05] [INFO] Running bash "/media/iwoolf/tenT/fluxgym/train.sh" [2024-09-12 22:28:11] [INFO] Traceback (most recent call last): [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/flux_train_network.py", line 13, in [2024-09-12 22:28:11] [INFO] from library import flux_models, flux_train_utils, flux_utils, sd3_train_utils, strategy_base, strategy_flux, train_util [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/flux_models.py", line 360, in [2024-09-12 22:28:11] [INFO] class ModelSpec: [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/flux_models.py", line 363, in ModelSpec [2024-09-12 22:28:11] [INFO] ckpt_path: str | None [2024-09-12 22:28:11] [INFO] TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' [2024-09-12 22:28:11] [INFO] Traceback (most recent call last): [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/env/bin/accelerate", line 8, in [2024-09-12 22:28:11] [INFO] sys.exit(main()) [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/env/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main [2024-09-12 22:28:11] [INFO] args.func(args) [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/env/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1106, in launch_command [2024-09-12 22:28:11] [INFO] simple_launcher(args) [2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/env/lib/python3.9/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher [2024-09-12 22:28:11] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2024-09-12 22:28:11] [INFO] subprocess.CalledProcessError: Command '['/media/iwoolf/tenT/fluxgym/env/bin/python', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', '/media/iwoolf/tenT/fluxgym/models/unet/flux1-dev.sft', '--clip_l', '/media/iwoolf/tenT/fluxgym/models/clip/clip_l.safetensors', '--t5xxl', '/media/iwoolf/tenT/fluxgym/models/clip/t5xxl_fp16.safetensors', '--ae', '/media/iwoolf/tenT/fluxgym/models/vae/ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', '/media/iwoolf/tenT/fluxgym/dataset.toml', '--output_dir', '/media/iwoolf/tenT/fluxgym/outputs', '--output_name', 'flux-venusfltrap', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-12 22:28:12] [ERROR] Command exited with code 1

iwoolf commented 2 months ago

I noticed the train config said 512, so I tried resizing all the images to 512, with the same error: [2024-09-12 22:47:40] [INFO] Running bash "/media/iwoolf/tenT/fluxgym/train.sh" [2024-09-12 22:47:57] [INFO] Traceback (most recent call last): [2024-09-12 22:47:57] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/flux_train_network.py", line 13, in [2024-09-12 22:47:57] [INFO] from library import flux_models, flux_train_utils, flux_utils, sd3_train_utils, strategy_base, strategy_flux, train_util [2024-09-12 22:47:57] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/flux_models.py", line 360, in [2024-09-12 22:47:58] [INFO] class ModelSpec: [2024-09-12 22:47:58] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/flux_models.py", line 363, in ModelSpec [2024-09-12 22:47:58] [INFO] ckpt_path: str | None [2024-09-12 22:47:58] [INFO] TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' [2024-09-12 22:47:58] [INFO] Traceback (most recent call last): [2024-09-12 22:47:58] [INFO] File "/media/iwoolf/tenT/fluxgym/env/bin/accelerate", line 8, in [2024-09-12 22:47:58] [INFO] sys.exit(main()) [2024-09-12 22:47:58] [INFO] File "/media/iwoolf/tenT/fluxgym/env/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main [2024-09-12 22:47:59] [INFO] args.func(args) [2024-09-12 22:47:59] [INFO] File "/media/iwoolf/tenT/fluxgym/env/lib/python3.9/site-packages/accelerate/commands/launch.py", line 1106, in launch_command [2024-09-12 22:47:59] [INFO] simple_launcher(args) [2024-09-12 22:47:59] [INFO] File "/media/iwoolf/tenT/fluxgym/env/lib/python3.9/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher [2024-09-12 22:47:59] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2024-09-12 22:47:59] [INFO] subprocess.CalledProcessError: Command '['/media/iwoolf/tenT/fluxgym/env/bin/python', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', '/media/iwoolf/tenT/fluxgym/models/unet/flux1-dev.sft', '--clip_l', '/media/iwoolf/tenT/fluxgym/models/clip/clip_l.safetensors', '--t5xxl', '/media/iwoolf/tenT/fluxgym/models/clip/t5xxl_fp16.safetensors', '--ae', '/media/iwoolf/tenT/fluxgym/models/vae/ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', '/media/iwoolf/tenT/fluxgym/dataset.toml', '--output_dir', '/media/iwoolf/tenT/fluxgym/outputs', '--output_name', 'flux-venusflytrap', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-12 22:48:00] [ERROR] Command exited with code 1 [2024-09-12 22:48:00] [INFO] Runner:

cerarslan commented 2 months ago

I found my solution, my main model was corrupted. I download it again and it starting train rn.

Ctson88 commented 2 months ago

I have the same error , I have a 3060 12gb vram and 32gb ram

XT-404 commented 2 months ago

When will this error code 1 be resolved, which is a real pain in the ass? Frankly I had no problem, the update comes out I install it, error code 1 in loop it becomes annoying

dondiegorivera commented 2 months ago

The same issue for me as well. I tried 512 and 768px, did not help. 4090 24GB vram / AMD Ryzen / 64GB ram .

dondiegorivera commented 2 months ago

The solution for me was to add short captions to every image, without special characters. It works now.

iwoolf commented 1 month ago

I updated to the latest version. All images resized to 512x512. I added short captions to every image, no special characters. I get the same error message. I'm using Ubuntu 24.04.1 LTS with 3060 12gb vram.

[2024-10-04 12:49:56] [INFO] Running bash "/media/iwoolf/tenT/fluxgym/outputs/flux-venusflytrap/train.sh" [2024-10-04 12:50:04] [INFO] highvram is enabled / highvramが有効です [2024-10-04 12:50:04] [INFO] 2024-10-04 12:50:04 WARNING cache_latents_to_disk is train_util.py:4022 [2024-10-04 12:50:04] [INFO] enabled, so cache_latents is [2024-10-04 12:50:04] [INFO] also enabled / [2024-10-04 12:50:04] [INFO] cache_latents_to_diskが有効なた [2024-10-04 12:50:04] [INFO] め、cache_latentsを有効にします [2024-10-04 12:50:04] [INFO] 2024-10-04 12:50:04 INFO t5xxl_max_token_length: flux_train_network.py:155 [2024-10-04 12:50:04] [INFO] 512 [2024-10-04 12:50:05] [INFO] /media/iwoolf/BigDrive/anaconda3/envs/fluxgym/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be deprecated in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 [2024-10-04 12:50:05] [INFO] warnings.warn( [2024-10-04 12:50:05] [INFO] You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 [2024-10-04 12:50:05] [INFO] 2024-10-04 12:50:05 INFO Loading dataset config from train_network.py:280 [2024-10-04 12:50:05] [INFO] /media/iwoolf/tenT/fluxgym/out [2024-10-04 12:50:05] [INFO] puts/flux-venusflytrap/dataset [2024-10-04 12:50:05] [INFO] .toml [2024-10-04 12:50:05] [INFO] INFO prepare images. train_util.py:1872 [2024-10-04 12:50:05] [INFO] INFO get image size from name of train_util.py:1810 [2024-10-04 12:50:05] [INFO] cache files [2024-10-04 12:50:05] [INFO] 0%| | 0/94 [00:00<?, ?it/s] 100%|██████████| 94/94 [00:00<00:00, 2775.71it/s] [2024-10-04 12:50:05] [INFO] INFO set image size from cache files: train_util.py:1817 [2024-10-04 12:50:05] [INFO] 0/94 [2024-10-04 12:50:05] [INFO] INFO found directory train_util.py:1819 [2024-10-04 12:50:05] [INFO] /media/iwoolf/tenT/fluxgym/datas [2024-10-04 12:50:05] [INFO] ets/flux-venusflytrap contains [2024-10-04 12:50:05] [INFO] 94 image files [2024-10-04 12:50:05] [INFO] Traceback (most recent call last): [2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/flux_train_network.py", line 522, in [2024-10-04 12:50:05] [INFO] trainer.train(args) [2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/train_network.py", line 317, in train [2024-10-04 12:50:05] [INFO] train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) [2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/config_util.py", line 485, in generate_dataset_group_by_blueprint [2024-10-04 12:50:05] [INFO] dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params)) [2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/train_util.py", line 1889, in init [2024-10-04 12:50:05] [INFO] img_paths, captions, sizes = load_dreambooth_dir(subset) [2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/train_util.py", line 1829, in load_dreambooth_dir [2024-10-04 12:50:05] [INFO] cap_for_img = read_caption(img_path, subset.caption_extension, subset.enable_wildcard) [2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/library/train_util.py", line 1769, in read_caption [2024-10-04 12:50:05] [INFO] assert len(lines) > 0, f"caption file is empty / キャプションファイルが空です: {cap_path}" [2024-10-04 12:50:05] [INFO] AssertionError: caption file is empty / キャプションファイルが空です: /media/iwoolf/tenT/fluxgym/datasets/flux-venusflytrap/00022-3826592931-photo of a nu___.txt [2024-10-04 12:50:06] [INFO] Traceback (most recent call last): [2024-10-04 12:50:06] [INFO] File "/media/iwoolf/BigDrive/anaconda3/envs/fluxgym/bin/accelerate", line 8, in [2024-10-04 12:50:06] [INFO] sys.exit(main()) [2024-10-04 12:50:06] [INFO] File "/media/iwoolf/BigDrive/anaconda3/envs/fluxgym/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main [2024-10-04 12:50:06] [INFO] args.func(args) [2024-10-04 12:50:06] [INFO] File "/media/iwoolf/BigDrive/anaconda3/envs/fluxgym/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1174, in launch_command [2024-10-04 12:50:06] [INFO] simple_launcher(args) [2024-10-04 12:50:06] [INFO] File "/media/iwoolf/BigDrive/anaconda3/envs/fluxgym/lib/python3.10/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher [2024-10-04 12:50:06] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2024-10-04 12:50:06] [INFO] subprocess.CalledProcessError: Command '['/media/iwoolf/BigDrive/anaconda3/envs/fluxgym/bin/python', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', '/media/iwoolf/tenT/fluxgym/models/unet/flux1-dev.sft', '--clip_l', '/media/iwoolf/tenT/fluxgym/models/clip/clip_l.safetensors', '--t5xxl', '/media/iwoolf/tenT/fluxgym/models/clip/t5xxl_fp16.safetensors', '--ae', '/media/iwoolf/tenT/fluxgym/models/vae/ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', '/media/iwoolf/tenT/fluxgym/outputs/flux-venusflytrap/dataset.toml', '--output_dir', '/media/iwoolf/tenT/fluxgym/outputs/flux-venusflytrap', '--output_name', 'flux-venusflytrap', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-10-04 12:50:06] [ERROR] Command exited with code 1 [2024-10-04 12:50:06] [INFO] Runner: