Open op7418 opened 2 months ago
I am also getting the same error. I tought it was because the config said ae.sft where I had ae.safetensors so I changed that and it progressed further (with a different error) but now changing that ae file name no longer helps. I am getting exited with code 1.
same error, but it worked 2 days ago, i didnt changed anything.
I have the same error on Ubuntu. I have a 3060. [2024-09-12 10:20:59] [INFO] subprocess.CalledProcessError: Command '['/media/iwoolf/tenT/fluxgym/env/bin/python', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', '/media/iwoolf/tenT/fluxgym/models/unet/flux1-dev.sft', '--clip_l', '/media/iwoolf/tenT/fluxgym/models/clip/clip_l.safetensors', '--t5xxl', '/media/iwoolf/tenT/fluxgym/models/clip/t5xxl_fp16.safetensors', '--ae', '/media/iwoolf/tenT/fluxgym/models/vae/ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', '/media/iwoolf/tenT/fluxgym/dataset.toml', '--output_dir', '/media/iwoolf/tenT/fluxgym/outputs', '--output_name', 'flux-venusflytrap', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-12 10:21:00] [ERROR] Command exited with code 1
@cocktailpeanut same error message since update
in addition I have this in the console which appears constantly for no reason
[2024-09-12 08:43:05] [INFO] Running F:\fluxgym\train.bat
[2024-09-12 08:43:05] [INFO]
[2024-09-12 08:43:05] [INFO] (env) F:\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "F:\fluxgym\models\unet\flux1-dev.sft" --clip_l "F:\fluxgym\models\clip\clip_l.safetensors" --t5xxl "F:\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "F:\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 16 --optimizer_type adamw8bit --sample_prompts="F:\fluxgym\sample_prompts.txt" --sample_every_n_steps="150" --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 15 --save_every_n_epochs 1 --dataset_config "F:\fluxgym\dataset.toml" --output_dir "F:\fluxgym\outputs" --output_name mika1 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2
[2024-09-12 08:43:08] [INFO] The following values were not passed to accelerate launch
and had defaults used instead:
[2024-09-12 08:43:08] [INFO] --num_processes
was set to a value of 1
[2024-09-12 08:43:08] [INFO] --num_machines
was set to a value of 1
[2024-09-12 08:43:08] [INFO] --dynamo_backend
was set to a value of 'no'
[2024-09-12 08:43:08] [INFO] To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
[2024-09-12 08:43:12] [INFO] highvram is enabled / highvramが有効です
[2024-09-12 08:43:12] [INFO] 2024-09-12 08:43:12 WARNING cache_latents_to_disk is train_util.py:3896
[2024-09-12 08:43:12] [INFO] enabled, so cache_latents is
[2024-09-12 08:43:12] [INFO] also enabled /
[2024-09-12 08:43:12] [INFO] cache_latents_to_diskが有効なた
[2024-09-12 08:43:12] [INFO] め、cache_latentsを有効にします
[2024-09-12 08:43:12] [INFO] 2024-09-12 08:43:12 INFO t5xxl_max_token_length: flux_train_network.py:155
[2024-09-12 08:43:12] [INFO] 512
[2024-09-12 08:43:13] [INFO] F:\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces
was not set. It will be set to True
by default. This behavior will be depracted in transformers v4.45, and will be then set to False
by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
[2024-09-12 08:43:13] [INFO] warnings.warn(
[2024-09-12 08:43:13] [INFO] You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
[2024-09-12 08:43:13] [INFO] 2024-09-12 08:43:13 INFO Loading dataset config from train_network.py:280
[2024-09-12 08:43:13] [INFO] F:\fluxgym\dataset.toml
[2024-09-12 08:43:13] [INFO] INFO prepare images. train_util.py:1803
[2024-09-12 08:43:13] [INFO] INFO get image size from name of train_util.py:1741
[2024-09-12 08:43:13] [INFO] cache files
[2024-09-12 08:43:13] [INFO] 0%| | 0/70 [00:00<?, ?it/s]
100%|██████████| 70/70 [00:00<00:00, 5772.62it/s]
[2024-09-12 08:43:13] [INFO] INFO set image size from cache train_util.py:1748
[2024-09-12 08:43:13] [INFO] files: 0/70
[2024-09-12 08:43:13] [INFO] INFO found directory train_util.py:1750
[2024-09-12 08:43:13] [INFO] F:\fluxgym\datasets\mika1
[2024-09-12 08:43:13] [INFO] contains 70 image files
[2024-09-12 08:43:13] [INFO] ERROR illegal char in file (not trainutil.py:1698
[2024-09-12 08:43:13] [INFO] UTF-8) /
[2024-09-12 08:43:13] [INFO] ファイルにUTF-8以外の文字があり
[2024-09-12 08:43:13] [INFO] ます:
[2024-09-12 08:43:13] [INFO] F:\fluxgym\datasets\mika1\Mika
[2024-09-12 08:43:13] [INFO] 7.txt
[2024-09-12 08:43:13] [INFO] Traceback (most recent call last):
[2024-09-12 08:43:13] [INFO] File "F:\fluxgym\sd-scripts\flux_train_network.py", line 519, in
File=[] ????
same
So I understood where the code 1 error comes from. If the dataset is not normalized via a specific resolution: 512 / 768 / 1024 the code crashes automatically, it seems that it does not support multi format resolution.
I Normalized all the images and the error code disappeared.
it would be good to add the possibility to the code of supporting multi resolution format
So I understood where the code 1 error comes from. If the dataset is not normalized via a specific resolution: 512 / 768 / 1024 the code crashes automatically, it seems that it does not support multi format resolution.
I Normalized all the images and the error code disappeared.
it would be good to add the possibility to the code of supporting multi resolution format
What do you mean when you say 'normalized' sir?
@cerarslan So when I say Normalization I mean that all of the images must be in exactly the same resolution format:
512X512 / 768X768 or 1024X1024
if an image is not in this type of resolution format the code crashes and indicates error 1
@cerarslan So when I say Normalization I mean that all of the images must be in exactly the same resolution format:
512X512 / 768X768 or 1024X1024
if an image is not in this type of resolution format the code crashes and indicates error 1
I just tried both 512x512 and 1024x1024 and the result did not change, I still get the same error. :/
@cerarslan did you modify your entire silk image to 512 or 1024? you must not mix, moreover you must indicate in advanced the resolution used below
@cerarslan did you modify your entire silk image to 512 or 1024? you must not mix, moreover you must indicate in advanced the resolution used below
No, I only tried 512x512 or 1024x1024, I did not mix the two resolutions, I tried all kinds of combinations but the result is the same. I think there is a problem in 'dataset.toml', is it possible to share the parameters in this file? Mine is like this;
[general] shuffle_caption = false caption_extension = '.txt' keep_tokens = 1
[[datasets]] resolution = 512 batch_size = 1 keep_tokens = 1
[[datasets.subsets]] image_dir = 'Z:\pinokio\api\fluxgym.git\datasets\ohwx-man' class_tokens = 'ohwx man' num_repeats = 10
I guess not all settings can be fully routed to this file.
@cerarslan
accelerate launch ^ --mixed_precision bf16 ^ --num_cpu_threads_per_process 1 ^ sd-scripts/flux_train_network.py ^ --pretrained_model_name_or_path "F:\fluxgym\models\unet\flux1-dev.sft" ^ --clip_l "F:\fluxgym\models\clip\clip_l.safetensors" ^ --t5xxl "F:\fluxgym\models\clip\t5xxl_fp16.safetensors" ^ --ae "F:\fluxgym\models\vae\ae.sft" ^ --cache_latents_to_disk ^ --save_model_as safetensors ^ --sdpa --persistent_data_loader_workers ^ --max_data_loader_n_workers 2 ^ --seed 42 ^ --gradient_checkpointing ^ --mixed_precision bf16 ^ --save_precision bf16 ^ --network_module networks.lora_flux ^ --network_dim 4 ^ --optimizer_type adamw8bit ^--sample_prompts="F:\fluxgym\sample_prompts.txt" --sample_every_n_steps="250" ^ --learning_rate 8e-4 ^ --cache_text_encoder_outputs ^ --cache_text_encoder_outputs_to_disk ^ --fp8_base ^ --highvram ^ --max_train_epochs 10 ^ --save_every_n_epochs 1 ^ --dataset_config "F:\fluxgym\dataset.toml" ^ --output_dir "F:\fluxgym\outputs" ^ --output_name mp ^ --timestep_sampling shift ^ --discrete_flow_shift 3.1582 ^ --model_prediction_type raw ^ --guidance_scale 1 ^ --loss_type l2 ^
[general] shuffle_caption = false caption_extension = '.txt' keep_tokens = 1
[[datasets]] resolution = 768 batch_size = 1 keep_tokens = 1
[[datasets.subsets]] image_dir = 'F:\fluxgym\datasets\mp' class_tokens = 'Mika' num_repeats = 4
@cerarslan
accelerate launch ^ --mixed_precision bf16 ^ --num_cpu_threads_per_process 1 ^ sd-scripts/flux_train_network.py ^ --pretrained_model_name_or_path "F:\fluxgym\models\unet\flux1-dev.sft" ^ --clip_l "F:\fluxgym\models\clip\clip_l.safetensors" ^ --t5xxl "F:\fluxgym\models\clip\t5xxl_fp16.safetensors" ^ --ae "F:\fluxgym\models\vae\ae.sft" ^ --cache_latents_to_disk ^ --save_model_as safetensors ^ --sdpa --persistent_data_loader_workers ^ --max_data_loader_n_workers 2 ^ --seed 42 ^ --gradient_checkpointing ^ --mixed_precision bf16 ^ --save_precision bf16 ^ --network_module networks.lora_flux ^ --network_dim 4 ^ --optimizer_type adamw8bit ^--sample_prompts="F:\fluxgym\sample_prompts.txt" --sample_every_n_steps="250" ^ --learning_rate 8e-4 ^ --cache_text_encoder_outputs ^ --cache_text_encoder_outputs_to_disk ^ --fp8_base ^ --highvram ^ --max_train_epochs 10 ^ --save_every_n_epochs 1 ^ --dataset_config "F:\fluxgym\dataset.toml" ^ --output_dir "F:\fluxgym\outputs" ^ --output_name mp ^ --timestep_sampling shift ^ --discrete_flow_shift 3.1582 ^ --model_prediction_type raw ^ --guidance_scale 1 ^ --loss_type l2 ^
[general] shuffle_caption = false caption_extension = '.txt' keep_tokens = 1
[[datasets]] resolution = 768 batch_size = 1 keep_tokens = 1
[[datasets.subsets]] image_dir = 'F:\fluxgym\datasets\mp' class_tokens = 'Mika' num_repeats = 4
Yes, this is the problem, when I edit the .toml file in this way and try to run it, it reverts back to its old state. (the parameters at the top are deleted).
Renaming ae.safetensors to ae.sft within the vae directory resolved the issue and the system began functioning as expected.
Renaming ae.safetensors to ae.sft within the vae directory resolved the issue and the system began functioning as expected.
Mine already ae.sft :/
Here last part of my log:
[2024-09-12 14:43:34] [INFO] subprocess.CalledProcessError: Command '['Z:\pinokio\api\fluxgym.git\env\Scripts\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'Z:\pinokio\api\fluxgym.git\models\unet\flux1-dev.sft', '--clip_l', 'Z:\pinokio\api\fluxgym.git\models\clip\clip_l.safetensors', '--t5xxl', 'Z:\pinokio\api\fluxgym.git\models\clip\t5xxl_fp16.safetensors', '--ae', 'Z:\pinokio\api\fluxgym.git\models\vae\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--sample_prompts=Z:\pinokio\api\fluxgym.git\sample_prompts.txt', '--sample_every_n_steps=250', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', 'Z:\pinokio\api\fluxgym.git\dataset.toml', '--output_dir', 'Z:\pinokio\api\fluxgym.git\outputs', '--output_name', 'ohwx', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1.
[2024-09-12 14:43:34] [ERROR] Command exited with code 1
[2024-09-12 14:43:34] [INFO] Runner:
I just used birme.net to resize all my photos to 768. I updated, and tried again, with the same error message:
[2024-09-12 22:28:05] [INFO] Running bash "/media/iwoolf/tenT/fluxgym/train.sh"
[2024-09-12 22:28:11] [INFO] Traceback (most recent call last):
[2024-09-12 22:28:11] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/flux_train_network.py", line 13, in
I noticed the train config said 512, so I tried resizing all the images to 512, with the same error:
[2024-09-12 22:47:40] [INFO] Running bash "/media/iwoolf/tenT/fluxgym/train.sh"
[2024-09-12 22:47:57] [INFO] Traceback (most recent call last):
[2024-09-12 22:47:57] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/flux_train_network.py", line 13, in
I found my solution, my main model was corrupted. I download it again and it starting train rn.
I have the same error , I have a 3060 12gb vram and 32gb ram
When will this error code 1 be resolved, which is a real pain in the ass? Frankly I had no problem, the update comes out I install it, error code 1 in loop it becomes annoying
The same issue for me as well. I tried 512 and 768px, did not help. 4090 24GB vram / AMD Ryzen / 64GB ram .
The solution for me was to add short captions to every image, without special characters. It works now.
I updated to the latest version. All images resized to 512x512. I added short captions to every image, no special characters. I get the same error message. I'm using Ubuntu 24.04.1 LTS with 3060 12gb vram.
[2024-10-04 12:49:56] [INFO] Running bash "/media/iwoolf/tenT/fluxgym/outputs/flux-venusflytrap/train.sh"
[2024-10-04 12:50:04] [INFO] highvram is enabled / highvramが有効です
[2024-10-04 12:50:04] [INFO] 2024-10-04 12:50:04 WARNING cache_latents_to_disk is train_util.py:4022
[2024-10-04 12:50:04] [INFO] enabled, so cache_latents is
[2024-10-04 12:50:04] [INFO] also enabled /
[2024-10-04 12:50:04] [INFO] cache_latents_to_diskが有効なた
[2024-10-04 12:50:04] [INFO] め、cache_latentsを有効にします
[2024-10-04 12:50:04] [INFO] 2024-10-04 12:50:04 INFO t5xxl_max_token_length: flux_train_network.py:155
[2024-10-04 12:50:04] [INFO] 512
[2024-10-04 12:50:05] [INFO] /media/iwoolf/BigDrive/anaconda3/envs/fluxgym/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: clean_up_tokenization_spaces
was not set. It will be set to True
by default. This behavior will be deprecated in transformers v4.45, and will be then set to False
by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
[2024-10-04 12:50:05] [INFO] warnings.warn(
[2024-10-04 12:50:05] [INFO] You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
[2024-10-04 12:50:05] [INFO] 2024-10-04 12:50:05 INFO Loading dataset config from train_network.py:280
[2024-10-04 12:50:05] [INFO] /media/iwoolf/tenT/fluxgym/out
[2024-10-04 12:50:05] [INFO] puts/flux-venusflytrap/dataset
[2024-10-04 12:50:05] [INFO] .toml
[2024-10-04 12:50:05] [INFO] INFO prepare images. train_util.py:1872
[2024-10-04 12:50:05] [INFO] INFO get image size from name of train_util.py:1810
[2024-10-04 12:50:05] [INFO] cache files
[2024-10-04 12:50:05] [INFO] 0%| | 0/94 [00:00<?, ?it/s]
100%|██████████| 94/94 [00:00<00:00, 2775.71it/s]
[2024-10-04 12:50:05] [INFO] INFO set image size from cache files: train_util.py:1817
[2024-10-04 12:50:05] [INFO] 0/94
[2024-10-04 12:50:05] [INFO] INFO found directory train_util.py:1819
[2024-10-04 12:50:05] [INFO] /media/iwoolf/tenT/fluxgym/datas
[2024-10-04 12:50:05] [INFO] ets/flux-venusflytrap contains
[2024-10-04 12:50:05] [INFO] 94 image files
[2024-10-04 12:50:05] [INFO] Traceback (most recent call last):
[2024-10-04 12:50:05] [INFO] File "/media/iwoolf/tenT/fluxgym/sd-scripts/flux_train_network.py", line 522, in
[2024-09-09 20:59:06] [INFO] Running F:\fluxgym\fluxgym\train.bat [2024-09-09 20:59:06] [INFO] [2024-09-09 20:59:06] [INFO] F:\fluxgym\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "F:\fluxgym\fluxgym\models\unet\flux1-dev.sft" --clip_l "F:\fluxgym\fluxgym\models\clip\clip_l.safetensors" --t5xxl "F:\fluxgym\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "F:\fluxgym\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 16 --save_every_n_epochs 4 --dataset_config "F:\fluxgym\fluxgym\dataset.toml" --output_dir "F:\fluxgym\fluxgym\outputs" --output_name dark-fantasy --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 [2024-09-09 20:59:06] [INFO] 'accelerate' �����ڲ����ⲿ���Ҳ���ǿ����еij��� [2024-09-09 20:59:06] [INFO] ���������ļ��� [2024-09-09 20:59:06] [ERROR] Command exited with code 1 [2024-09-09 20:59:06] [INFO] Runner: