comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
51.14k stars 5.38k forks source link

OOM Allocation on device error on 4090 #4853

Closed johntheguyperson closed 1 week ago

johntheguyperson commented 1 week ago

Your question

I'm getting an OOM error on a 4090 which I find weird since a lot of people are able to run this on lower vram gpu's. Was just trying to do a test run with like 15 images. It will run for like 5-10 steps, Then it crashes. I have 32gb of system ram, 24 of vram.

Logs

# ComfyUI Error Report
## Error Details
- **Node Type:** FluxTrainLoop
- **Exception Type:** torch.cuda.OutOfMemoryError
- **Exception Message:** Allocation on device 
## Stack Trace

  File "C:\StableDiffusion\Flux_Comfy\ComfyUI\execution.py", line 317, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

  File "C:\StableDiffusion\Flux_Comfy\ComfyUI\execution.py", line 192, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

  File "C:\StableDiffusion\Flux_Comfy\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)

  File "C:\StableDiffusion\Flux_Comfy\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))

  File "C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\nodes.py", line 746, in train
    steps_done = training_loop(

  File "C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\ComfyUI-FluxTrainer\train_network.py", line 1198, in training_loop
    accelerator.backward(loss)

  File "C:\StableDiffusion\Flux_Comfy\venv\lib\site-packages\accelerate\accelerator.py", line 2159, in backward
    loss.backward(**kwargs)

  File "C:\StableDiffusion\Flux_Comfy\venv\lib\site-packages\torch\_tensor.py", line 525, in backward
    torch.autograd.backward(

  File "C:\StableDiffusion\Flux_Comfy\venv\lib\site-packages\torch\autograd\__init__.py", line 267, in backward
    _engine_run_backward(

  File "C:\StableDiffusion\Flux_Comfy\venv\lib\site-packages\torch\autograd\graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

System Information

Logs

2024-09-09 00:15:57,233 - root - INFO - Total VRAM 24564 MB, total RAM 32607 MB
2024-09-09 00:15:57,233 - root - INFO - pytorch version: 2.3.0+cu121
2024-09-09 00:15:57,233 - root - INFO - Set vram state to: NORMAL_VRAM
2024-09-09 00:15:57,233 - root - INFO - Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
2024-09-09 00:15:58,286 - root - INFO - Using pytorch cross attention
2024-09-09 00:15:58,962 - root - INFO - [Prompt Server] web root: C:\StableDiffusion\Flux_Comfy\ComfyUI\web
2024-09-09 00:15:59,906 - root - INFO - Total VRAM 24564 MB, total RAM 32607 MB
2024-09-09 00:15:59,906 - root - INFO - pytorch version: 2.3.0+cu121
2024-09-09 00:15:59,907 - root - INFO - Set vram state to: NORMAL_VRAM
2024-09-09 00:15:59,907 - root - INFO - Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
2024-09-09 00:16:00,202 - root - INFO - 
Import times for custom nodes:
2024-09-09 00:16:00,202 - root - INFO -    0.0 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\websocket_image_save.py
2024-09-09 00:16:00,202 - root - INFO -    0.0 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\ComfyUI_JPS-Nodes
2024-09-09 00:16:00,202 - root - INFO -    0.0 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\comfy-image-saver
2024-09-09 00:16:00,202 - root - INFO -    0.0 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\rgthree-comfy
2024-09-09 00:16:00,202 - root - INFO -    0.0 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\ComfyUI-KJNodes
2024-09-09 00:16:00,202 - root - INFO -    0.1 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\comfyui_controlnet_aux
2024-09-09 00:16:00,202 - root - INFO -    0.2 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\ComfyUI-Manager
2024-09-09 00:16:00,202 - root - INFO -    0.5 seconds: C:\StableDiffusion\Flux_Comfy\ComfyUI\custom_nodes\ComfyUI-FluxTrainer
2024-09-09 00:16:00,202 - root - INFO - 
2024-09-09 00:16:00,208 - root - INFO - Starting server

2024-09-09 00:16:00,209 - root - INFO - To see the GUI go to: http://0.0.0.0:8188
2024-09-09 00:16:07,485 - root - INFO - got prompt

Attached Workflow

Please make sure that workflow does not contain any sensitive information such as API keys or passwords.

Workflow too large. Please manually upload the workflow from local file system.

Additional Context

(Please add any additional context or steps to reproduce the error here)



### Other

Args-

Namespace(console_log_level=None, console_log_file=None, console_log_simple=False, v2=False, v_parameterization=False, pretrained_model_name_or_path='C:\\StableDiffusion\\Flux_Comfy\\ComfyUI\\models\\unet\\flux1-dev.safetensors', tokenizer_cache_dir=None, train_data_dir=None, cache_info=False, shuffle_caption=False, caption_separator=',', caption_extension='.caption', caption_extention=None, keep_tokens=0, keep_tokens_separator='', secondary_separator=None, enable_wildcard=False, caption_prefix=None, caption_suffix=None, color_aug=False, flip_aug=False, face_crop_aug_range=None, random_crop=False, debug_dataset=False, resolution=None, cache_latents=True, vae_batch_size=1, cache_latents_to_disk=True, enable_bucket=False, min_bucket_reso=256, max_bucket_reso=1024, bucket_reso_steps=64, bucket_no_upscale=False, token_warmup_min=1, token_warmup_step=0.0, alpha_mask=False, dataset_class=None, caption_dropout_rate=0.0, caption_dropout_every_n_epochs=0, caption_tag_dropout_rate=0.0, reg_data_dir=None, in_json=None, dataset_repeats=1, output_dir='flux_lora_output_path', output_name='flux_lora_file_name_rank16_bf16', huggingface_repo_id=None, huggingface_repo_type=None, huggingface_path_in_repo=None, huggingface_token=None, huggingface_repo_visibility=None, save_state_to_huggingface=False, resume_from_huggingface=False, async_upload=False, save_precision='bf16', save_every_n_epochs=None, save_every_n_steps=None, save_n_epoch_ratio=None, save_last_n_epochs=None, save_last_n_epochs_state=None, save_last_n_steps=None, save_last_n_steps_state=None, save_state=False, save_state_on_train_end=False, resume=None, train_batch_size=1, max_token_length=None, mem_eff_attn=True, torch_compile=False, dynamo_backend='inductor', xformers=False, sdpa=False, vae=None, max_train_steps=500, max_train_epochs=None, max_data_loader_n_workers=0, persistent_data_loader_workers=False, seed=42, gradient_checkpointing=True, gradient_accumulation_steps=1, mixed_precision='bf16', full_fp16=False, full_bf16=True, fp8_base=True, fp8_dtype='e4m3', ddp_timeout=None, ddp_gradient_as_bucket_view=False, ddp_static_graph=False, clip_skip=None, logging_dir=None, log_with=None, log_prefix=None, log_tracker_name=None, wandb_run_name=None, log_tracker_config=None, wandb_api_key=None, log_config=False, noise_offset=None, noise_offset_random_strength=False, multires_noise_iterations=None, ip_noise_gamma=None, ip_noise_gamma_random_strength=False, multires_noise_discount=0.3, adaptive_noise_scale=None, zero_terminal_snr=False, min_timestep=None, max_timestep=None, loss_type='l2', huber_schedule='snr', huber_c=0.1, lowram=False, highvram=False, sample_every_n_steps=None, sample_at_first=False, sample_every_n_epochs=None, sample_prompts=['cute girl blonde messy long hair blue eyes sprawled in bed draped with sheets. college girl exposed. naked and tan. barely covered by sheets.'], sample_sampler='ddim', config_file=None, output_config=False, metadata_title=None, metadata_author=None, metadata_description=None, metadata_license=None, metadata_tags=None, prior_loss_weight=1.0, conditioning_data_dir=None, masked_loss=False, deepspeed=False, zero_stage=2, offload_optimizer_device=None, offload_optimizer_nvme_path=None, offload_param_device=None, offload_param_nvme_path=None, zero3_init_flag=False, zero3_save_16bit_model=False, fp16_master_weights_and_gradients=False, optimizer_type='CAME', use_8bit_adam=False, use_lion_optimizer=False, learning_rate=0.0004, max_grad_norm=1.0, optimizer_args=[], lr_scheduler_type='', lr_scheduler_args=None, lr_scheduler='constant', lr_warmup_steps=0, lr_scheduler_num_cycles=1, lr_scheduler_power=1.0, fused_backward_pass=False, dataset_config='[[datasets]]\nresolution = [ 515, 512,]\nbatch_size = 1\nenable_bucket = true\nbucket_no_upscale = false\nmin_bucket_reso = 256\nmax_bucket_reso = 1024\n[[datasets.subsets]]\nimage_dir = "datasets\\\\ImagenSmallBed"\nclass_tokens = ""\nnum_repeats = 1\n\n\n[[datasets]]\nresolution = [ 768, 768,]\nbatch_size = 1\nenable_bucket = true\nbucket_no_upscale = false\nmin_bucket_reso = 256\nmax_bucket_reso = 1024\n[[datasets.subsets]]\nimage_dir = "datasets\\\\ImagenSmallBed"\nclass_tokens = ""\nnum_repeats = 1\n\n\n[[datasets]]\nresolution = [ 1024, 1024,]\nbatch_size = 1\nenable_bucket = true\nbucket_no_upscale = false\nmin_bucket_reso = 256\nmax_bucket_reso = 1024\n[[datasets.subsets]]\nimage_dir = "datasets\\\\ImagenSmallBed"\nclass_tokens = ""\nnum_repeats = 1\n\n\n[general]\nshuffle_caption = false\ncaption_extension = ".txt"\nkeep_tokens_separator = "|||"\ncaption_dropout_rate = 0.0\ncolor_aug = false\nflip_aug = false\n', min_snr_gamma=5.0, scale_v_pred_loss_like_noise_pred=False, v_pred_like_loss=None, debiased_estimation_loss=False, weighted_captions=False, no_metadata=False, save_model_as='safetensors', unet_lr=None, text_encoder_lr=0.0, fp8_base_unet=True, network_weights=None, network_module='.networks.lora_flux', network_dim=16, network_alpha=0.5, network_dropout=None, network_args=[], network_train_unet_only=True, network_train_text_encoder_only=False, training_comment=None, dim_from_weights=False, scale_weight_norms=None, base_weights=None, base_weights_multiplier=None, no_half_vae=False, skip_until_initial_step=False, initial_epoch=None, initial_step=None, cpu_offload_checkpointing=False, num_cpu_threads_per_process=1, clip_l='C:\\StableDiffusion\\Flux_Comfy\\ComfyUI\\models\\clip\\clip_l.safetensors', t5xxl='C:\\StableDiffusion\\Flux_Comfy\\ComfyUI\\models\\clip\\t5xxl_fp16.safetensors', ae='C:\\StableDiffusion\\Flux_Comfy\\ComfyUI\\models\\vae\\ae.safetensors', t5xxl_max_token_length=512, disable_mmap_load_safetensors=False, split_mode=False, spda=True, apply_t5_attn_mask=True, cache_text_encoder_outputs=True, weighting_scheme='logit_normal', logit_mean=0.0, logit_std=1.0, mode_scale=1.29, timestep_sampling='shift', sigmoid_scale=1.0, model_prediction_type='raw', guidance_scale=1.0, discrete_flow_shift=3.1582000000000003, cache_text_encoder_outputs_to_disk=True)
ltdrdata commented 1 week ago

This issue should be moved to ComfyUI-FluxTrainer. https://github.com/kijai/ComfyUI-FluxTrainer