LarryJane491 / Lora-Training-in-Comfy

This custom node lets you train LoRA directly in ComfyUI!
364 stars 50 forks source link

I follow your guide #12

Open kakachiex2 opened 7 months ago

kakachiex2 commented 7 months ago

Hi Larry I install a clean version of comfyui following your guide I already have little experience installing python program in a venv environment but wen I install your extension it uninstall the pytorch and its dependency and replace with the one from your requirement exactly the same behavior with the portable version.

kakachiex2 commented 7 months ago

The first node I installed is your and I get this it is ok or can affect other extension. Screenshot 2024-02-07 163101

kakachiex2 commented 7 months ago

I install everything as you say and get this error I even create a venvLora folder specially for your node and keep the same error

K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. Traceback (most recent call last): File "K:\ComfyUI\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 11, in import toml ModuleNotFoundError: No module named 'toml' Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "K:\ComfyUI\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 1033, in main() File "K:\ComfyUI\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 1029, in main launch_command(args) File "K:\ComfyUI\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 1023, in launch_command simple_launcher(args) File "K:\ComfyUI\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 643, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['K:\ComfyUI\ComfyUI\venv\Scripts\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=K:\Automatic111\stable-diffusion-webui\models/Stable-diffusion\rundiffusionFX25D_v10.safetensors', '--train_data_dir=C:/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=Kakachiex_Niji_LoRA.', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=Kakachiex_Niji_LoRA.', '--train_batch_size=1', '--save_every_n_epochs=5', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 6.39 seconds

LarryJane491 commented 7 months ago

Hey there ^^. Don't make a separate venv for the node. All extensions of ComfyUI must be in the same environment as the main program. Use a single venv, make sure to install all requirements in that same venv, and there is no reason it wouldn't work.

As for the uninstalling process: I don't think it should matter since it reinstalls it. Did the node fail again after you made that clean install? What error did it give?

kakachiex2 commented 7 months ago

Wen I install the requirement all my others extension node deactivate and don't want to load I will install a separate comfyui to see if it works but from what I read you can have multiple venv and load it at the same time

kakachiex2 commented 7 months ago

Know it works Yahooo... but I don't know were it save the lora, it suppose to be in the lora folder but not

kakachiex2 commented 7 months ago

I'm using the advanced training node

LarryJane491 commented 7 months ago

Good to know ^^. Loras are installed in models/loras. It's the default for Comfy. After training, you just have to refresh the page of ComfyUI if you haven't closed it though.

Did you find out why it made other nodes stop working? What error do they give? There can be inconsistencies between the requirements of nodes, but it shouldn't be all of them...

kakachiex2 commented 7 months ago

Looks like it replaces some other node requirements, and they stop working after installing your node happens after install your requirement.

kakachiex2 commented 7 months ago

please check if it's something wrong here because I don't see the lora after refreshing comfy and not in the folder

K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config. prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory C:\database\30_Niji contains 30 image files No caption file found for 30 images. Training will continue without captions for these images. If class token exists, it will be used. / 30枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。 C:\database\30_Niji\Niji_Style_30.jpg C:\database\30_Niji\Niji_Style_31.png C:\database\30_Niji\Niji_Style_32.jpg C:\database\30_Niji\Niji_Style_33.jpg C:\database\30_Niji\Niji_Style_34.png C:\database\30_Niji\Niji_Style_35.jpg... and 25 more found directory C:\database\8_Niji contains 8 image files 964 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (768, 768) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1584 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "C:\database\30_Niji" image_count: 30 num_repeats: 30 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Niji caption_extension: .txt

[Subset 1 of Dataset 0] image_dir: "C:\database\8_Niji" image_count: 8 num_repeats: 8 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Niji caption_extension: .txt

[Dataset 0] loading image sizes. 100%|█████████████████████████████████████████████████████████████████████████████████| 38/38 [00:00<00:00, 101.25it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 1088), count: 30 bucket 1: resolution (576, 1024), count: 240 bucket 2: resolution (640, 896), count: 664 bucket 3: resolution (704, 832), count: 30 mean ar error (without repeats): 0.03162946469814704 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: K:\Automatic111\stable-diffusion-webui\models/Stable-diffusion\rundiffusionFX25D_v10.safetensors UNet2DConditionModel: 64, [5, 10, 20, 20], 1024, False, False Traceback (most recent call last): File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in trainer.train(args) File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 228, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 102, in load_target_model textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype, accelerator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3917, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( ^^^^^^^^^^^^^^^^^^^ File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\train_util.py", line 3860, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\library\model_util.py", line 1007, in load_models_from_stable_diffusion_checkpoint info = unet.load_state_dict(converted_unet_checkpoint) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\venv\Lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([640, 1024]). size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([320, 1024]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([1280, 1024]). Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "K:\ComfyUI\ComfyUI_Ex\ComfyUI\venv\Lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['K:\ComfyUI\ComfyUI_Ex\ComfyUI\venv\Scripts\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=K:\Automatic111\stable-diffusion-webui\models/Stable-diffusion\rundiffusionFX25D_v10.safetensors', '--train_data_dir=C:/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=KakachiexNiji', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=16', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=KakachiexNiji', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=28', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--v2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1. Train finished Prompt executed in 20.79 seconds FETCH DATA from: K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\ComfyUI-Manager.cache\1514988643_custom-node-list.json FETCH DATA from: K:\ComfyUI\ComfyUI_Ex\ComfyUI\custom_nodes\ComfyUI-Manager.cache\1742899825_extension-node-map.json

kakachiex2 commented 7 months ago

Here is my setting Screenshot 2024-02-10 170453

LarryJane491 commented 7 months ago

Ah, it's a size mismatch. Usually it happens when there is a model type inconsistency (for example: you try to make a SDXL lora based on a SD1.5 model).

I see you have set v2 to Yes. That makes it a SD 2.0 Lora model. But rundiffusionFX25D_v10 is a SD1.5 model. Maybe that's the issue? Try with v2=No.

GustavBlack commented 4 months ago

LarryJane491 is this error related to this same thing? I tried different ones and still was not able to use this node :


Traceback (most recent call last):
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in <module>
    main()
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main
    launch_command(args)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
    simple_launcher(args)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\Gustav\\AppData\\Local\\Programs\\Python\\Python310\\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=J:\\Stable\\ComfyUI_windows_portable\\ComfyUI\\models\\checkpoints\\SDXL\\virileXL_alpha10.safetensors', '--train_data_dir=C:/Users/Gustav/Pictures/results/BO/rafael/database', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=Rafy_face_loRa', '--resolution=768,768', '--network_module=networks.lora', '--max_train_epochs=40', '--learning_rate=1e-4', '--unet_lr=1.e-4', '--text_encoder_lr=1.e-4', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=Rafy_face_loRa', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=16', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--v2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--v2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--v2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--v2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 2.
Train finished
Prompt executed in 4.52 seconds```
I think I'm missing something very specific for this to work.