Is the 4070TI 12G not available? I chose the 12G option.

wuliang19869312 commented 2 months ago

[2024-09-16 12:53:56] [INFO] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacity of 11.99 GiB of which 6.73 GiB is free. Of the allocated memory 4.00 GiB is allocated by PyTorch, and 14.50 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) [2024-09-16 12:53:57] [INFO] Traceback (most recent call last): [2024-09-16 12:53:57] [INFO] File "", line 198, in _run_module_as_main [2024-09-16 12:53:57] [INFO] File "", line 88, in _run_code [2024-09-16 12:53:57] [INFO] File "D:\fluxgym\env\Scripts\accelerate.exe__main__.py", line 7, in [2024-09-16 12:53:57] [INFO] File "D:\fluxgym\env\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2024-09-16 12:53:57] [INFO] args.func(args) [2024-09-16 12:53:57] [INFO] File "D:\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command [2024-09-16 12:53:57] [INFO] simple_launcher(args) [2024-09-16 12:53:57] [INFO] File "D:\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher [2024-09-16 12:53:57] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2024-09-16 12:53:57] [INFO] subprocess.CalledProcessError: Command '['D:\fluxgym\env\Scripts\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'D:\fluxgym\models\unet\flux1-dev.sft', '--clip_l', 'D:\fluxgym\models\clip\clip_l.safetensors', '--t5xxl', 'D:\fluxgym\models\clip\t5xxl_fp16.safetensors', '--ae', 'D:\fluxgym\models\vae\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '8', '--save_every_n_epochs', '4', '--dataset_config', 'D:\fluxgym\dataset.toml', '--output_dir', 'D:\fluxgym\outputs', '--output_name', 'ds-lora', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-16 12:53:58] [ERROR] Command exited with code 1 [2024-09-16 12:53:58] [INFO] Runner:

byteconcepts commented 2 months ago

Strange, because I always see at your script call the --highvram Parameter. - If that's correct if you chose the 12G Option? Maybe it is, but I doubt it.

wuliang19869312 commented 2 months ago

Strange, because I always see at your script call the --highvram Parameter. - If that's correct if you chose the 12G Option? Maybe it is, but I doubt it.

The one I called up was 12G~~ I changed the parameters a little bit myself to make it work, but just now I found out that the whole system doesn't work~~ Traceback (most recent call last): File "D:\fluxgym\app.py", line 17, in import train_network ModuleNotFoundError: No module named 'train_network'

chnisar515 commented 2 months ago

Strange, because I always see at your script call the --highvram Parameter. - If that's correct if you chose the 12G Option? Maybe it is, but I doubt it.

The one I called up was 12G~~ I changed the parameters a little bit myself to make it work, but just now I found out that the whole system doesn't work~~ Traceback (most recent call last): File "D:\fluxgym\app.py", line 17, in import train_network ModuleNotFoundError: No module named 'train_network'

same error here.

wuliang19869312 commented 2 months ago

很奇怪，因为我总是看到你的脚本调用 --highvram 参数。- 如果你选择了 12G 选项，这是正确的吗？也许是，但我对此表示怀疑。

我调出来的是12G~~自己稍微改了一下参数，就搞定了，刚才发现整个系统不行了~~ Traceback (most recent call last): File "D:\fluxgym\app.py", line 17, in import train_network ModuleNotFoundError: No module named 'train_network'

这里也有同样的错误。 Did you fix it back there? Yesterday it was fine, today this problem is happening again~~

[2024-09-19 00:07:03] [INFO] D:\fluxgym\datasets\fuji [2024-09-19 00:07:03] [INFO] contains 15 image files [2024-09-19 00:07:03] [INFO] Traceback (most recent call last): [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\sd-scripts\flux_train_network.py", line 519, in [2024-09-19 00:07:03] [INFO] trainer.train(args) [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\sd-scripts\train_network.py", line 317, in train [2024-09-19 00:07:03] [INFO] train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) [2024-09-19 00:07:03] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\sd-scripts\library\config_util.py", line 485, in generate_dataset_group_by_blueprint [2024-09-19 00:07:03] [INFO] dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params)) [2024-09-19 00:07:03] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\sd-scripts\library\train_util.py", line 1825, in init [2024-09-19 00:07:03] [INFO] img_paths, captions, sizes = load_dreambooth_dir(subset) [2024-09-19 00:07:03] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\sd-scripts\library\train_util.py", line 1765, in load_dreambooth_dir [2024-09-19 00:07:03] [INFO] cap_for_img = read_caption(img_path, subset.caption_extension, subset.enable_wildcard) [2024-09-19 00:07:03] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\sd-scripts\library\train_util.py", line 1705, in read_caption [2024-09-19 00:07:03] [INFO] assert len(lines) > 0, f"caption file is empty / キャプションファイルが空です: {cap_path}" [2024-09-19 00:07:03] [INFO] ^^^^^^^^^^^^^^ [2024-09-19 00:07:03] [INFO] AssertionError: caption file is empty / キャプションファイルが空です: D:\fluxgym\datasets\fuji\18.txt [2024-09-19 00:07:03] [INFO] Traceback (most recent call last): [2024-09-19 00:07:03] [INFO] File "", line 198, in _run_module_as_main [2024-09-19 00:07:03] [INFO] File "", line 88, in _run_code [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\env\Scripts\accelerate.exe__main__.py", line 7, in [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\env\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2024-09-19 00:07:03] [INFO] args.func(args) [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command [2024-09-19 00:07:03] [INFO] simple_launcher(args) [2024-09-19 00:07:03] [INFO] File "D:\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher [2024-09-19 00:07:03] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2024-09-19 00:07:03] [INFO] subprocess.CalledProcessError: Command '['D:\fluxgym\env\Scripts\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'D:\fluxgym\models\unet\flux1-dev.sft', '--clip_l', 'D:\fluxgym\models\clip\clip_l.safetensors', '--t5xxl', 'D:\fluxgym\models\clip\t5xxl_fp16.safetensors', '--ae', 'D:\fluxgym\models\vae\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '4', '--save_every_n_epochs', '4', '--dataset_config', 'D:\fluxgym\outputs\fuji\dataset.toml', '--output_dir', 'D:\fluxgym\outputs\fuji', '--output_name', 'fuji', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1. [2024-09-19 00:07:03] [ERROR] Command exited with code 1 [2024-09-19 00:07:03] [INFO] Runner:

KnightOfMensab commented 1 month ago

@byteconcepts fluxgym uses the --highvram parameter no matter which VRAM size is selected. Which doesn't make much sense, given that it explicitly removes optimization for lower VRAM sizes according to https://github.com/kohya-ss/sd-scripts/releases :

An option --highvram to disable the optimization for environments with little VRAM is added to the training scripts. If you specify it when there is enough VRAM, the operation will be faster.

cocktailpeanut / fluxgym

Is the 4070TI 12G not available? I chose the 12G option. #90