[Bug]: Video card has meltdown during upscales

yogiyushi commented 10 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

any time i try upscale video card crashes

Steps to reproduce the problem

upscale using universal upscaler neutral or ultrasharp, probly any upscaler doing batch, crashes randomly after 1 or more images

What should have happened?

not crash/ meltdown

Sysinfo

{ "Platform": "Windows-10-10.0.19045-SP0", "Python": "3.10.7", "Version": "v1.6.0", "Commit": "5ef669de080814067961f28357256e8fe27544f4", "Script path": "F:\Automatic 1111\stable-diffusion-webui", "Data path": "F:\Automatic 1111\stable-diffusion-webui", "Extensions dir": "F:\Automatic 1111\stable-diffusion-webui\extensions", "Checksum": "87ec57d733ac27fc6ddd4a23446fabe3810ed9019302e60054678dd94f76c62b", "Commandline": [ "launch.py", "--xformers", "--medvram", "--no-half-vae" ], "Torch env info": "'NoneType' object has no attribute 'splitlines'", "Exceptions": [], "CPU": { "model": "Intel64 Family 6 Model 158 Stepping 9, GenuineIntel", "count logical": 8, "count physical": 4 }, "RAM": { "total": "32GB", "used": "11GB", "free": "21GB" }, "Extensions": [], "Inactive extensions": [], "Environment": { "COMMANDLINE_ARGS": " --xformers --medvram --no-half-vae ", "GRADIO_ANALYTICS_ENABLED": "False" }, "Config": { "samples_save": true, "samples_format": "png", "samples_filename_pattern": "", "save_images_add_number": false, "grid_save": false, "grid_format": "png", "grid_extended_filename": false, "grid_only_if_multiple": false, "grid_prevent_empty_spots": false, "grid_zip_filename_pattern": "", "n_rows": -1, "font": "", "grid_text_active_color": "#000000", "grid_text_inactive_color": "#999999", "grid_background_color": "#ffffff", "enable_pnginfo": true, "save_txt": false, "save_images_before_face_restoration": false, "save_images_before_highres_fix": false, "save_images_before_color_correction": false, "save_mask": false, "save_mask_composite": false, "jpeg_quality": 80, "webp_lossless": false, "export_for_4chan": false, "img_downscale_threshold": 4.0, "target_side_length": 4000, "img_max_size_mp": 200, "use_original_name_batch": true, "use_upscaler_name_as_suffix": false, "save_selected_only": true, "save_init_img": false, "temp_dir": "", "clean_temp_dir_at_start": false, "save_incomplete_images": false, "outdir_samples": "", "outdir_txt2img_samples": "outputs/txt2img-images", "outdir_img2img_samples": "outputs/img2img-images", "outdir_extras_samples": "outputs/extras-images", "outdir_grids": "", "outdir_txt2img_grids": "outputs/txt2img-grids", "outdir_img2img_grids": "outputs/img2img-grids", "outdir_save": "log/images", "outdir_init_images": "outputs/init-images", "save_to_dirs": true, "grid_save_to_dirs": true, "use_save_to_dirs_for_ui": false, "directories_filename_pattern": "[date]", "directories_max_prompt_words": 8, "ESRGAN_tile": 192, "ESRGAN_tile_overlap": 8, "realesrgan_enabled_models": [ "R-ESRGAN 4x+", "R-ESRGAN 4x+ Anime6B" ], "upscaler_for_img2img": "4x_UniversalUpscalerV2-Neutral_115000_swaG", "face_restoration": false, "face_restoration_model": "CodeFormer", "code_former_weight": 0.5, "face_restoration_unload": false, "auto_launch_browser": "Local", "show_warnings": false, "show_gradio_deprecation_warnings": true, "memmon_poll_rate": 8, "samples_log_stdout": false, "multiple_tqdm": true, "print_hypernet_extra": false, "list_hidden_files": true, "disable_mmap_load_safetensors": false, "hide_ldm_prints": true, "api_enable_requests": true, "api_forbid_local_requests": true, "api_useragent": "", "unload_models_when_training": false, "pin_memory": false, "save_optimizer_state": false, "save_training_settings_to_txt": true, "dataset_filename_word_regex": "", "dataset_filename_join_string": " ", "training_image_repeats_per_epoch": 1, "training_write_csv_every": 500, "training_xattention_optimizations": false, "training_enable_tensorboard": false, "training_tensorboard_save_images": false, "training_tensorboard_flush_every": 120, "sd_model_checkpoint": "sd_xl_base_1.0.safetensors [31e35c80fc]", "sd_checkpoints_limit": 1, "sd_checkpoints_keep_in_cpu": true, "sd_checkpoint_cache": 0, "sd_unet": "Automatic", "enable_quantization": true, "enable_emphasis": true, "enable_batch_seeds": true, "comma_padding_backtrack": 20, "CLIP_stop_at_last_layers": 1, "upcast_attn": false, "randn_source": "GPU", "tiling": false, "hires_fix_refiner_pass": "second pass", "sdxl_crop_top": 0, "sdxl_crop_left": 0, "sdxl_refiner_low_aesthetic_score": 2.5, "sdxl_refiner_high_aesthetic_score": 6.0, "sd_vae_explanation": "VAE is a neural network that transforms a standard RGB\nimage into latent space representation and back. Latent space representation is what stable diffusion is working on during sampling\n(i.e. when the progress bar is between empty and full). For txt2img, VAE is used to create a resulting image after the sampling is finished.\nFor img2img, VAE is used to process user's input image before the sampling, and to create an image after sampling.", "sd_vae_checkpoint_cache": 0, "sd_vae": "sdxl_vae.safetensors", "sd_vae_overrides_per_model_preferences": true, "auto_vae_precision": true, "sd_vae_encode_method": "Full", "sd_vae_decode_method": "Full", "inpainting_mask_weight": 1.0, "initial_noise_multiplier": 1.0, "img2img_extra_noise": 0.0, "img2img_color_correction": false, "img2img_fix_steps": true, "img2img_background_color": "#ffffff", "img2img_editor_height": 720, "img2img_sketch_default_brush_color": "#ffffff", "img2img_inpaint_mask_brush_color": "#ffffff", "img2img_inpaint_sketch_default_brush_color": "#ffffff", "return_mask": false, "return_mask_composite": false, "cross_attention_optimization": "Automatic", "s_min_uncond": 0.0, "token_merging_ratio": 0.0, "token_merging_ratio_img2img": 0.0, "token_merging_ratio_hr": 0.0, "pad_cond_uncond": false, "persistent_cond_cache": true, "batch_cond_uncond": true, "use_old_emphasis_implementation": false, "use_old_karras_scheduler_sigmas": false, "no_dpmpp_sde_batch_determinism": false, "use_old_hires_fix_width_height": false, "dont_fix_second_order_samplers_schedule": false, "hires_fix_use_firstpass_conds": false, "use_old_scheduling": false, "interrogate_keep_models_in_memory": false, "interrogate_return_ranks": false, "interrogate_clip_num_beams": 1, "interrogate_clip_min_length": 24, "interrogate_clip_max_length": 48, "interrogate_clip_dict_limit": 1500, "interrogate_clip_skip_categories": [], "interrogate_deepbooru_score_threshold": 0.5, "deepbooru_sort_alpha": true, "deepbooru_use_spaces": true, "deepbooru_escape": true, "deepbooru_filter_tags": "", "extra_networks_show_hidden_directories": true, "extra_networks_hidden_models": "When searched", "extra_networks_default_multiplier": 1.0, "extra_networks_card_width": 0, "extra_networks_card_height": 0, "extra_networks_card_text_scale": 1.0, "extra_networks_card_show_desc": true, "extra_networks_add_text_separator": " ", "ui_extra_networks_tab_reorder": "", "textual_inversion_print_at_load": false, "textual_inversion_add_hashes_to_infotext": true, "sd_hypernetwork": "None", "localization": "None", "gradio_theme": "Default", "gradio_themes_cache": true, "gallery_height": "", "return_grid": true, "do_not_show_images": false, "send_seed": true, "send_size": true, "js_modal_lightbox": true, "js_modal_lightbox_initially_zoomed": true, "js_modal_lightbox_gamepad": false, "js_modal_lightbox_gamepad_repeat": 250, "show_progress_in_title": true, "samplers_in_dropdown": true, "dimensions_and_batch_together": true, "keyedit_precision_attention": 0.1, "keyedit_precision_extra": 0.05, "keyedit_delimiters": ".,\/!?%^*;:{}=`~()", "keyedit_move": true, "quicksettings_list": [ "sd_model_checkpoint", "sd_vae" ], "ui_tab_order": [], "hidden_tabs": [], "ui_reorder_list": [], "hires_fix_show_sampler": false, "hires_fix_show_prompts": false, "disable_token_counters": false, "add_model_hash_to_info": true, "add_model_name_to_info": true, "add_user_name_to_info": false, "add_version_to_infotext": true, "disable_weights_auto_swap": true, "infotext_styles": "Apply if any", "show_progressbar": false, "live_previews_enable": false, "live_previews_image_format": "png", "show_progress_grid": false, "show_progress_every_n_steps": 10, "show_progress_type": "Approx NN", "live_preview_allow_lowvram_full": false, "live_preview_content": "Prompt", "live_preview_refresh_period": 1000, "live_preview_fast_interrupt": false, "hide_samplers": [], "eta_ddim": 0.0, "eta_ancestral": 1.0, "ddim_discretize": "uniform", "s_churn": 0.0, "s_tmin": 0.0, "s_tmax": 0.0, "s_noise": 1.0, "k_sched_type": "Automatic", "sigma_min": 0.0, "sigma_max": 0.0, "rho": 0.0, "eta_noise_seed_delta": 0, "always_discard_next_to_last_sigma": false, "sgm_noise_multiplier": false, "uni_pc_variant": "bh1", "uni_pc_skip_type": "time_uniform", "uni_pc_order": 3, "uni_pc_lower_order_final": true, "postprocessing_enable_in_main_ui": [], "postprocessing_operation_order": [], "upscaling_max_images_in_cache": 5, "disabled_extensions": [], "disable_all_extensions": "none", "restore_config_state_file": "", "sd_checkpoint_hash": "31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b", "ldsr_steps": 100, "ldsr_cached": false, "SCUNET_tile": 256, "SCUNET_tile_overlap": 8, "SWIN_tile": 192, "SWIN_tile_overlap": 8, "lora_functional": false, "sd_lora": "None", "lora_preferred_name": "Alias from file", "lora_add_hashes_to_infotext": true, "lora_show_all": false, "lora_hide_unknown_for_versions": [], "lora_in_memory_limit": 0, "extra_options_txt2img": [], "extra_options_img2img": [], "extra_options_cols": 1, "extra_options_accordion": false, "canvas_hotkey_zoom": "Alt", "canvas_hotkey_adjust": "Ctrl", "canvas_hotkey_move": "F", "canvas_hotkey_fullscreen": "S", "canvas_hotkey_reset": "R", "canvas_hotkey_overlap": "O", "canvas_show_tooltip": true, "canvas_auto_expand": true, "canvas_blur_prompt": false, "canvas_disabled_functions": [ "Overlap" ] }, "Startup": { "total": 10.206248760223389, "records": { "initial startup": 0.0010004043579101562, "prepare environment/checks": 0.012997627258300781, "prepare environment/git version info": 0.050000667572021484, "prepare environment/torch GPU test": 1.7600007057189941, "prepare environment/clone repositores": 0.1659996509552002, "prepare environment/run extensions installers": 0.0, "prepare environment": 2.0529966354370117, "launcher": 0.0020036697387695312, "import torch": 2.9939966201782227, "import gradio": 0.8275730609893799, "setup paths": 0.6869983673095703, "import ldm": 0.005006074905395508, "import sgm": 0.0, "initialize shared": 0.20566916465759277, "other imports": 0.5879993438720703, "opts onchange": 0.0, "setup SD model": 0.0019998550415039062, "setup codeformer": 0.09400057792663574, "setup gfpgan": 0.017999649047851562, "set samplers": 0.0, "list extensions": 0.0, "restore config state file": 0.0, "list SD models": 0.002001523971557617, "list localizations": 0.0, "load scripts/custom_code.py": 0.0019981861114501953, "load scripts/img2imgalt.py": 0.0010001659393310547, "load scripts/loopback.py": 0.0, "load scripts/outpainting_mk_2.py": 0.0, "load scripts/poor_mans_outpainting.py": 0.0, "load scripts/postprocessing_codeformer.py": 0.0, "load scripts/postprocessing_gfpgan.py": 0.0010004043579101562, "load scripts/postprocessing_upscale.py": 0.0, "load scripts/prompt_matrix.py": 0.0, "load scripts/prompts_from_file.py": 0.0, "load scripts/refiner.py": 0.0009999275207519531, "load scripts/sd_upscale.py": 0.0, "load scripts/seed.py": 0.0, "load scripts/xyz_grid.py": 0.002000093460083008, "load scripts/ldsr_model.py": 0.9160013198852539, "load scripts/lora_script.py": 0.13500094413757324, "load scripts/scunet_model.py": 0.023998260498046875, "load scripts/swinir_model.py": 0.02299952507019043, "load scripts/hotkey_config.py": 0.0, "load scripts/extra_options_section.py": 0.0, "load scripts": 1.1049988269805908, "load upscalers": 0.007999658584594727, "refresh VAE": 0.0010001659393310547, "refresh textual inversion templates": 0.0, "scripts list_optimizers": 0.001001596450805664, "scripts list_unets": 0.0, "reload hypernetworks": 0.0009989738464355469, "initialize extra networks": 0.02599930763244629, "scripts before_ui_callback": 0.0020008087158203125, "create ui": 0.5429990291595459, "gradio launch": 1.092005729675293, "add APIs": 0.010997772216796875, "app_started_callback/lora_script.py": 0.0009999275207519531, "app_started_callback": 0.0009999275207519531 } }, "Packages": [ "absl-py==1.4.0", "accelerate==0.21.0", "addict==2.4.0", "aenum==3.1.15", "aiofiles==23.2.1", "aiohttp==3.8.5", "aiosignal==1.3.1", "altair==5.1.1", "antlr4-python3-runtime==4.9.3", "anyio==3.7.1", "async-timeout==4.0.3", "attrs==23.1.0", "basicsr==1.4.2", "beautifulsoup4==4.12.2", "blendmodes==2022", "boltons==23.0.0", "cachetools==5.3.1", "certifi==2023.7.22", "charset-normalizer==3.2.0", "clean-fid==0.1.35", "click==8.1.7", "clip==1.0", "colorama==0.4.6", "contourpy==1.1.0", "cycler==0.11.0", "deprecation==2.1.0", "einops==0.4.1", "exceptiongroup==1.1.3", "facexlib==0.3.0", "fastapi==0.94.0", "ffmpy==0.3.1", "filelock==3.12.3", "filterpy==1.4.5", "fonttools==4.42.1", "frozenlist==1.4.0", "fsspec==2023.9.0", "ftfy==6.1.1", "future==0.18.3", "gdown==4.7.1", "gfpgan==1.3.8", "gitdb==4.0.10", "gitpython==3.1.32", "google-auth-oauthlib==1.0.0", "google-auth==2.22.0", "gradio-client==0.5.0", "gradio==3.41.2", "grpcio==1.57.0", "h11==0.12.0", "httpcore==0.15.0", "httpx==0.24.1", "huggingface-hub==0.16.4", "idna==3.4", "imageio==2.31.3", "importlib-metadata==6.8.0", "importlib-resources==6.0.1", "inflection==0.5.1", "jinja2==3.1.2", "jsonmerge==1.8.0", "jsonschema-specifications==2023.7.1", "jsonschema==4.19.0", "kiwisolver==1.4.5", "kornia==0.6.7", "lark==1.1.2", "lazy-loader==0.3", "lightning-utilities==0.9.0", "llvmlite==0.40.1", "lmdb==1.4.1", "lpips==0.1.4", "markdown==3.4.4", "markupsafe==2.1.3", "matplotlib==3.7.2", "mpmath==1.3.0", "multidict==6.0.4", "networkx==3.1", "numba==0.57.1", "numpy==1.23.5", "oauthlib==3.2.2", "omegaconf==2.2.3", "open-clip-torch==2.20.0", "opencv-python==4.8.0.76", "orjson==3.9.5", "packaging==23.1", "pandas==2.1.0", "piexif==1.1.3", "pillow==9.5.0", "pip==22.2.2", "platformdirs==3.10.0", "protobuf==3.20.0", "psutil==5.9.5", "pyasn1-modules==0.3.0", "pyasn1==0.5.0", "pydantic==1.10.12", "pydub==0.25.1", "pyparsing==3.0.9", "pysocks==1.7.1", "python-dateutil==2.8.2", "python-multipart==0.0.6", "pytorch-lightning==1.9.4", "pytz==2023.3", "pywavelets==1.4.1", "pyyaml==6.0.1", "realesrgan==0.3.0", "referencing==0.30.2", "regex==2023.8.8", "requests-oauthlib==1.3.1", "requests==2.31.0", "resize-right==0.0.2", "rpds-py==0.10.0", "rsa==4.9", "safetensors==0.3.1", "scikit-image==0.21.0", "scipy==1.11.2", "semantic-version==2.10.0", "sentencepiece==0.1.99", "setuptools==63.2.0", "six==1.16.0", "smmap==5.0.0", "sniffio==1.3.0", "soupsieve==2.5", "starlette==0.26.1", "sympy==1.12", "tb-nightly==2.15.0a20230902", "tensorboard-data-server==0.7.1", "tifffile==2023.8.30", "timm==0.9.2", "tokenizers==0.13.3", "tomesd==0.1.3", "tomli==2.0.1", "toolz==0.12.0", "torch==2.0.1+cu118", "torchdiffeq==0.2.3", "torchmetrics==1.1.1", "torchsde==0.2.5", "torchvision==0.15.2+cu118", "tqdm==4.66.1", "trampoline==0.1.2", "transformers==4.30.2", "typing-extensions==4.7.1", "tzdata==2023.3", "urllib3==1.26.16", "uvicorn==0.23.2", "wcwidth==0.2.6", "websockets==11.0.3", "werkzeug==2.3.7", "wheel==0.41.2", "xformers==0.0.20", "yapf==0.40.1", "yarl==1.9.2", "zipp==3.16.2" ] }

What browsers do you use to access the UI ?

No response

Console logs

brave

Additional information

No response

yogiyushi commented 10 months ago

no issues doing txt to image or image to image, the problem is related to upscales, gpu usage spikes to 100% screen goes weird computer completely crashes, i am running a batch by the way, i have found a few reports online from others experiencing same issue

viebrix commented 10 months ago

My Cinnamon in Linux Mint also crashed, after updating to new version 1.6 and controlnet 1.1 and using hires fix latent 2x (which is default turned on). Linux Error: amdgpu 0000:08:00.0: amdgpu: 00000000baf7c3f2 pin failed [drm:dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12

Setting hiresfix to None 1x, does work. I know it is not exactly the same issue as @yogiyushi but, could be the same source.

yogiyushi commented 10 months ago

Below you can see a spike that crashes app and sometimes cause machine reboot after a few upscales

Taskmgr_YwO3KkZU6k

catboxanon commented 10 months ago

Stable Diffusion should not affect anything in that 3D graph, which means you have something else going wrong. That CPU spike is awfully suspicious as well. Change one of those graphs to Cuda and you'll see the actual utilization.

Can't be 100% sure from what task manager shows but you might want to check your nvidia driver version and downgrade if necessary as well (versions above 531 have issues currently). https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063

yogiyushi commented 10 months ago

i understand 3d has nothing to do with upscaling but this spike only occurs during that process, I will try out new driver version :)

levicki commented 10 months ago

If your GPU is 50 degrees Celsius when idle (4% use) what is the temperature when it is under 100% load? You can use GPU-Z to log the temperature to a file.

If you are getting "weird screen" (I presume colored blocks / artifacts?) then your video RAM might be failing (or already has). Upscaling probably uses more VRAM than other processes so it might be hitting a bad VRAM chip.

Can you run Furmark without crashing?

A word of warning — Furmark can fry your card if your cooling is faulty or if something is already damaged.

yogiyushi commented 10 months ago

i have msi to control fan curve and power, i have set to never go over 80, i did this after getting weird issues before i was hitting 88 but averaging 80-82 mostly

yogiyushi commented 10 months ago

It's not just upscale, its txt2image, image2image etc.... the card displays artifacts, freezes, then crashes.

i have tried command line settings none of which solve the problem.

I have had some success using msi afterburner to control temp limit set to 72, sometimes lowering mem clock core clock and overall power. Running SD 2.1 seems more stable by far but eventually still can cause crash.

Many post online of 1080ti artifacting after several years of use (it's from 2016), next step memory analysis and possible board repair

levicki commented 10 months ago

@yogiyushi

the card displays artifacts, freezes, then crashes.

As I said, most likely some of the VRAM chips is bad. It's an old card (not in the technology sense but also in age). Sadly, nothing lasts forever.

Tesserakt-company commented 5 months ago

Same issue on a one year old 3060TI. PC is pausing completely to the point it can't even play sound.

AUTOMATIC1111 / stable-diffusion-webui