Mikubill / sd-webui-controlnet

WebUI extension for ControlNet
GNU General Public License v3.0
16.84k stars 1.94k forks source link

[Bug]: Upscaling very large images with SD Ultimate Upscale has VERY SLOW preprocessing (75% of total execution time) #1648

Closed marcsyp closed 10 months ago

marcsyp commented 1 year ago

Is there an existing issue for this?

What happened?

I am doing very large upscales using SD Ultimate Upscale and the ControlNet tile model. Upscaling 4608 x 6144 images 3x to 13824 x 18432 works quite well with my workflow, but it is VERRRRRY slow, particularly on the model loading between each tile. The actual tile rendering is quite fast at 0.24 denoising (roughly 12 seconds on a 3080Ti), but the preprocessing step between each tile is close to 29 seconds, which is roughly 70% of the total execution time.

From the time stamps, it looks like the loading of the tile model from cache may account for 9 of the 29 seconds, but I'm not sure what accounts for the remaining 20 seconds, perhaps some are consumed by the tiling process of SDUS itself (comparable upscale without CN enabled has about 8s of denoising, 17s of prep), so CN could be responsible for up to 12s.

These renders take about 1h40m each, but if there were some way to optimize the model loading, I feel like there is a LOT of opportunity for gains here, drastically reducing the time required for very large upscales. 8-12 seconds times 144 tiles is 25-38 minutes of time just spent on model loading/preprocessing.

(NOTE: Smaller upscales also have decently long model loading wait, but not unbearable. For a 2x of a 2304x3072 with 2304x768 tiles and 0.24 denoising (with more steps), I'm getting 17s of rendering and 9s of loading/preprocessing, with tile size that is slightly larger -- that's roughly 34% of the total execution time. Not sure why the model loading is faster with a smaller upscale when the tile size is actually larger. Total execution per render here is 6.75m, roughly, so it's less of an issue.)

Any thoughts here appreciated!

Steps to reproduce the problem

  1. Go to img2img
  2. Add a file that is roughly 4608x6144 (6x of a 768x1024), add some prompt
  3. Enable controlnet, pixel perfect, Balanced, tile_resample model, 1.25 strength, 0.7 ECS
  4. Denoising 0.24
  5. SD Ultimate Upscale settings are 3x, tile size 1732x1024
  6. Generate. Watch and wait :)

What should have happened?

Would be great if the preprocessing step were roughly equivalent to that of a single tile size image (1-5s, instead of 30s). Don't know if there are technical limitations that prevent this, or whether it's simply an inefficient algo that works fine at most smaller resolutions and nobody has questioned it for larger res.

Commit where the problem happens

webui: python: 3.10.6  •  torch: 1.13.1+cu117  •  xformers: 0.0.16rc425  •  gradio: 3.23.0  •  commit: 22bcc7be  •  checkpoint: 9aba26abdf controlnet: 1.1.224

What browsers do you use to access the UI ?

No response

Command Line Arguments

@echo off

set PYTHON="C:\Users\marcs\AppData\Local\Programs\Python\Python310\python.exe"
set GIT=
set VENV_DIR=
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:128
set COMMANDLINE_ARGS=--opt-channelslast --use-cpu interrogate --theme dark --no-half-vae --xformers
set ATTN_PRECISION=fp16
set SAFETENSORS_FAST_GPU=1

call webui.bat

List of enabled extensions

image

Console logs

WITH CONTROLNET
Canva size: 13824x18432
Image size: 4608x6144
Scale factor: 3
Upscaling iteration 1 with scale factor 3
Tile size: 1732x1024
Tiles amount: 144
Grid: 18x8
Redraw enabled: True
Seams fix mode: NONE
Image Width: 13824
Tile Width: 1856
Image Height: 18432
Tile Height: 1088
2023-06-15 20:28:23,824 - ControlNet - INFO - Loading model: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:28:24,510 - ControlNet - INFO - Loaded state_dict from [C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_v11f1e_sd15_tile.pth]
2023-06-15 20:28:24,512 - ControlNet - INFO - Loading config: C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_v11f1e_sd15_tile.yaml
2023-06-15 20:28:27,279 - ControlNet - INFO - ControlNet model control_v11f1e_sd15_tile [a371b31b] loaded.
2023-06-15 20:28:36,240 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:28:36,242 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:28:36,242 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:28:36,242 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:28:36,242 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:28:36,243 - ControlNet - INFO - target_H = 1088
2023-06-15 20:28:36,243 - ControlNet - INFO - target_W = 1856
2023-06-15 20:28:36,244 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:28:36,244 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00,  1.27s/it]
could not find upscaler named <empty string>, using None as a fallback               | 10/1440 [00:11<27:05,  1.14s/it]
2023-06-15 20:29:17,192 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:29:26,419 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:29:26,419 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:29:26,420 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:29:26,420 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:29:26,420 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:29:26,420 - ControlNet - INFO - target_H = 1088
2023-06-15 20:29:26,421 - ControlNet - INFO - target_W = 1856
2023-06-15 20:29:26,421 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:29:26,421 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00,  1.28s/it]
could not find upscaler named <empty string>, using None as a fallback               | 20/1440 [00:55<36:37,  1.55s/it]
2023-06-15 20:29:59,485 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:30:07,545 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:30:07,546 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:30:07,546 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:30:07,546 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:30:07,546 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:30:07,546 - ControlNet - INFO - target_H = 1088
2023-06-15 20:30:07,547 - ControlNet - INFO - target_W = 1856
2023-06-15 20:30:07,547 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:30:07,547 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00,  1.25s/it]
could not find upscaler named <empty string>, using None as a fallback               | 30/1440 [01:35<34:30,  1.47s/it]
2023-06-15 20:30:38,869 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:30:47,424 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:30:47,424 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:30:47,425 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:30:47,425 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:30:47,425 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:30:47,425 - ControlNet - INFO - target_H = 1088
2023-06-15 20:30:47,427 - ControlNet - INFO - target_W = 1856
2023-06-15 20:30:47,427 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:30:47,428 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00,  1.25s/it]
could not find upscaler named <empty string>, using None as a fallback               | 40/1440 [02:14<34:10,  1.46s/it]
2023-06-15 20:31:17,689 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:31:26,139 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:31:26,139 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:31:26,139 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:31:26,139 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:31:26,141 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:31:26,141 - ControlNet - INFO - target_H = 1088
2023-06-15 20:31:26,141 - ControlNet - INFO - target_W = 1856
2023-06-15 20:31:26,142 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:31:26,143 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.44s/it]
could not find upscaler named <empty string>, using None as a fallback               | 50/1440 [02:56<36:31,  1.58s/it]
2023-06-15 20:32:00,264 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:32:08,809 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:32:08,810 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:32:08,810 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:32:08,810 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:32:08,810 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:32:08,810 - ControlNet - INFO - target_H = 1088
2023-06-15 20:32:08,810 - ControlNet - INFO - target_W = 1856
2023-06-15 20:32:08,810 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:32:08,810 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00,  1.29s/it]
could not find upscaler named <empty string>, using None as a fallback               | 60/1440 [03:36<35:04,  1.53s/it]
2023-06-15 20:32:39,979 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:32:49,077 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:32:49,077 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:32:49,077 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:32:49,078 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:32:49,078 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:32:49,078 - ControlNet - INFO - target_H = 1088
2023-06-15 20:32:49,078 - ControlNet - INFO - target_W = 1856
2023-06-15 20:32:49,079 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:32:49,079 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.59s/it]
could not find upscaler named <empty string>, using None as a fallback               | 70/1440 [04:20<38:25,  1.68s/it]
2023-06-15 20:33:24,653 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-15 20:33:32,698 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-15 20:33:32,698 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-15 20:33:32,699 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-15 20:33:32,699 - ControlNet - INFO - raw_H = 1088
2023-06-15 20:33:32,699 - ControlNet - INFO - raw_W = 1856
2023-06-15 20:33:32,699 - ControlNet - INFO - target_H = 1088
2023-06-15 20:33:32,699 - ControlNet - INFO - target_W = 1856
2023-06-15 20:33:32,699 - ControlNet - INFO - estimation = 1088.0
2023-06-15 20:33:32,699 - ControlNet - INFO - preprocessor resolution = 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:13<00:00,  1.31s/it]
Total progress:   6%|███▌                                                            | 80/1440 [05:00<36:46,  1.62s/it] 

WITHOUT CONTROLNET
Canva size: 13824x18432
Image size: 4608x6144
Scale factor: 3
Upscaling iteration 1 with scale factor 3
Tile size: 1732x1024
Tiles amount: 144
Grid: 18x8
Redraw enabled: True
Seams fix mode: NONE
Image Width: 13824
Tile Width: 1856
Image Height: 18432
Tile Height: 1088
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.13it/s]
could not find upscaler named <empty string>, using None as a fallback               | 10/1440 [00:07<20:38,  1.15it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.12it/s]
could not find upscaler named <empty string>, using None as a fallback               | 20/1440 [00:37<26:48,  1.13s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:09<00:00,  1.05it/s]
could not find upscaler named <empty string>, using None as a fallback               | 30/1440 [01:03<25:58,  1.11s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.14it/s]
could not find upscaler named <empty string>, using None as a fallback               | 40/1440 [01:30<25:44,  1.10s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.13it/s]
could not find upscaler named <empty string>, using None as a fallback               | 50/1440 [01:57<25:36,  1.11s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.14it/s]
could not find upscaler named <empty string>, using None as a fallback               | 60/1440 [02:21<24:37,  1.07s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.12it/s]
could not find upscaler named <empty string>, using None as a fallback               | 70/1440 [02:46<25:00,  1.10s/it]

Additional information

This behaviour is not new, has been this way since the first time I started doing these large upscales, probably a month ago.

huchenlei commented 1 year ago

I don't think the tile resample preprocessor takes that long. Neither does the model loading. From the log timestamps you can see these tasks are completed relatively fast. However, we can add more debug logs to pinpoint the issue.

marcsyp commented 1 year ago

@huchenlei most are definitely very short -- especially the loading preprocessor. But these two lines:

2023-06-15 20:33:24,653 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2023-06-15 20:33:32,698 - ControlNet - INFO - Loading preprocessor: tile_resample

This indicates 8 seconds to load the model from the cache -- am I reading that correctly?

Thanks.

huchenlei commented 1 year ago

@huchenlei most are definitely very short -- especially the loading preprocessor. But these two lines:

2023-06-15 20:33:24,653 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2023-06-15 20:33:32,698 - ControlNet - INFO - Loading preprocessor: tile_resample

This indicates 8 seconds to load the model from the cache -- am I reading that correctly?

Thanks.

Sorry I misread the timestmaps. There can be other things between these 2 log statements that is taking long. The loading model from cache is simply accessing an item in a dict, which should'nt be the culprit. I am going to add more debug logs and try reproduce the issue.

huchenlei commented 1 year ago

After adding some timing logs:

2023-06-16 16:27:19,566 - ControlNet - DEBUG - title ran in: 0.0 sec
2023-06-16 16:27:19,567 - ControlNet - DEBUG - title ran in: 0.0 sec
2023-06-16 16:27:19,567 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,567 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,567 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,567 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,568 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,568 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,570 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,570 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,571 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,572 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,572 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,573 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,573 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,574 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,574 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,575 - ControlNet - DEBUG - parse_remote_call ran in: 0.008007049560546875 sec
2023-06-16 16:27:19,575 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,576 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,577 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,577 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,578 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,578 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,580 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,585 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,589 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,590 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,593 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,593 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,594 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,594 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,595 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,595 - ControlNet - DEBUG - parse_remote_call ran in: 0.02001810073852539 sec
2023-06-16 16:27:19,596 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,597 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,604 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,604 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,605 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,606 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,606 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,607 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,607 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,608 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,608 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,609 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,609 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,610 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,610 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,611 - ControlNet - DEBUG - parse_remote_call ran in: 0.01501321792602539 sec
2023-06-16 16:27:19,612 - ControlNet - DEBUG - get_enabled_units ran in: 0.04504084587097168 sec
2023-06-16 16:27:19,612 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b]
2023-06-16 16:27:19,613 - ControlNet - DEBUG - load_control_model ran in: 0.001001119613647461 sec
2023-06-16 16:27:19,620 - ControlNet - DEBUG - get_remote_call ran in: 0.0 sec
2023-06-16 16:27:19,823 - ControlNet - DEBUG - choose_input_image ran in: 0.2031846046447754 sec
2023-06-16 16:27:19,823 - ControlNet - DEBUG - A1111 inpaint mask START
2023-06-16 16:27:21,728 - ControlNet - DEBUG - A1111 inpaint mask END
2023-06-16 16:27:21,728 - ControlNet - DEBUG - Safe numpy convertion START
2023-06-16 16:27:21,732 - ControlNet - DEBUG - Safe numpy convertion END
2023-06-16 16:27:21,732 - ControlNet - INFO - Loading preprocessor: tile_resample
2023-06-16 16:27:21,732 - ControlNet - INFO - Pixel Perfect Computation:
2023-06-16 16:27:21,732 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE
2023-06-16 16:27:21,732 - ControlNet - INFO - raw_H = 1088
2023-06-16 16:27:21,732 - ControlNet - INFO - raw_W = 1792
2023-06-16 16:27:21,732 - ControlNet - INFO - target_H = 1088
2023-06-16 16:27:21,733 - ControlNet - INFO - target_W = 1792
2023-06-16 16:27:21,733 - ControlNet - INFO - estimation = 1088.0
2023-06-16 16:27:21,733 - ControlNet - INFO - preprocessor resolution = 1088
2023-06-16 16:27:21,737 - ControlNet - DEBUG - Calling preprocessor tile_resample outside of cache.
2023-06-16 16:27:23,750 - ControlNet - DEBUG - detectmap_proc ran in: 2.012845993041992 sec
2023-06-16 16:27:23,778 - ControlNet - DEBUG - process ran in: 4.2112884521484375 sec
 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                        | 4/5 [01:47<00:26, 26.84s/it]
2023-06-16 16:29:17,052 - ControlNet - DEBUG - postprocess_batch ran in: 0.0 sec                                                                                                                                                                                                            | 4/180 [00:34<27:21,  9.33s/it]
2023-06-16 16:29:18,114 - ControlNet - DEBUG - postprocess ran in: 0.3112828731536865 sec

The issue seems to be in the A1111 mask handling code:

            if 'reference' not in unit.module and issubclass(type(p), StableDiffusionProcessingImg2Img) \
                    and p.inpaint_full_res and a1111_mask_image is not None:
                logger.debug("A1111 inpaint mask START")
                input_image = [input_image[:, :, i] for i in range(input_image.shape[2])]
                input_image = [Image.fromarray(x) for x in input_image]

                mask = prepare_mask(a1111_mask_image, p)

                crop_region = masking.get_crop_region(np.array(mask), p.inpaint_full_res_padding)
                crop_region = masking.expand_crop_region(crop_region, p.width, p.height, mask.width, mask.height)

                input_image = [
                    images.resize_image(resize_mode.int_value(), i, mask.width, mask.height) 
                    for i in input_image
                ]

                input_image = [x.crop(crop_region) for x in input_image]
                input_image = [
                    images.resize_image(external_code.ResizeMode.OUTER_FIT.int_value(), x, p.width, p.height) 
                    for x in input_image
                ]

                input_image = [np.asarray(x)[:, :, 0] for x in input_image]
                input_image = np.stack(input_image, axis=2)
                logger.debug("A1111 inpaint mask END")

I am not sure why running tile preprocessor in img2img (Not inpaint), with ultimate SD upscaler triggers this logic. I think you can add more logs to pinpoint which line is causing the problem. I cannot reproduce the 8s cost on my local setup though.

Let's assign this to @lllyasviel as I am really not familiar with this masking code.

marcsyp commented 1 year ago

Interesting -- thanks for looking into this. It looks like two processes take up a large portion of the execution time in your example,. the inpaint mask and the detectmap_proc both take around 2 seconds. Since you didn't mention the second, I'm guessing this is unavoidable and essential code. I am surprised, like you, that the inpaint mask code is running -- it seems p.inpaint_full_res and a1111_mask_image should both be false/none when doing an img2img upscale with CN Tile model, but I don't know the code well enough to know if either CN or SDUS relies on this logic for some reason. I might do a test where I force skip this logic to see if it has any adverse effects while speeding up the upscaling...

marcsyp commented 1 year ago

OK, I think I have an answer to the masking question. Here is the code from Ultimate Upscaler -- it is using the mask functionality of A1111 to crop the tiles:

def init_draw(self, p, width, height): p.inpaint_full_res = True p.inpaint_full_res_padding = self.padding p.width = math.ceil(min(self.tile_width+self.padding, width) / 64) 64 p.height = math.ceil(min(self.tile_height+self.padding, height) / 64) 64 print(f"Image Width: {width}") print(f"Tile Width: {p.width}") print(f"Image Height: {height}") print(f"Tile Height: {p.height}") mask = Image.new("L", (width, height), "black") draw = ImageDraw.Draw(mask) return mask, draw

def calc_rectangle(self, xi, yi):
    x1 = xi * self.tile_width
    y1 = yi * self.tile_height
    x2 = xi * self.tile_width + self.tile_width
    y2 = yi * self.tile_height + self.tile_height

    return x1, y1, x2, y2

def linear_process(self, p, image, rows, cols):
    mask, draw = self.init_draw(p, image.width, image.height)
    for yi in range(rows):
        for xi in range(cols):
            if state.interrupted:
                break
            draw.rectangle(self.calc_rectangle(xi, yi), fill="white")
            p.init_images = [image]
            **p.image_mask = mask**
            processed = processing.process_images(p)
            draw.rectangle(self.calc_rectangle(xi, yi), fill="black")
            if (len(processed.images) > 0):
                image = processed.images[0]

    p.width = image.width
    p.height = image.height
    self.initial_info = processed.infotext(p, 0)

    return image

def chess_process(self, p, image, rows, cols):
    mask, draw = self.init_draw(p, image.width, image.height)
    tiles = []
    # calc tiles colors
    for yi in range(rows):
        for xi in range(cols):
            if state.interrupted:
                break
            if xi == 0:
                tiles.append([])
            color = xi % 2 == 0
            if yi > 0 and yi % 2 != 0:
                color = not color
            tiles[yi].append(color)

    for yi in range(len(tiles)):
        for xi in range(len(tiles[yi])):
            if state.interrupted:
                break
            if not tiles[yi][xi]:
                tiles[yi][xi] = not tiles[yi][xi]
                continue
            tiles[yi][xi] = not tiles[yi][xi]
            draw.rectangle(self.calc_rectangle(xi, yi), fill="white")
            p.init_images = [image]
            p.image_mask = mask
            processed = processing.process_images(p)
            draw.rectangle(self.calc_rectangle(xi, yi), fill="black")
            if (len(processed.images) > 0):
                image = processed.images[0]

    for yi in range(len(tiles)):
        for xi in range(len(tiles[yi])):
            if state.interrupted:
                break
            if not tiles[yi][xi]:
                continue
            draw.rectangle(self.calc_rectangle(xi, yi), fill="white")
            p.init_images = [image]
            p.image_mask = mask
            processed = processing.process_images(p)
            draw.rectangle(self.calc_rectangle(xi, yi), fill="black")
            if (len(processed.images) > 0):
                image = processed.images[0]

    p.width = image.width
    p.height = image.height
    self.initial_info = processed.infotext(p, 0)

    return image
huchenlei commented 1 year ago

I think we can definitely do some improvements in the tiling /croping process, as currently if the image is big, the process of cropping can take significant amount of time.

That should also be the reason why on my reproduction the mask code only runs for 2 sec, because I am using a much smaller input image (2048 x 3072).

To further improve efficiency, I think all crop should be done in a single pass, instead of cropping a single tile off the input image, process and repeat for number of tiles.

marcsyp commented 1 year ago

Exactly what I was thinking -- but I was concerned that this is an optimization that needs to happen on the SD Upscaler side... would be curious to hear how you think this optimization could happen. It would save a lot of rendering time at all stages.

Thanks!

marcsyp commented 1 year ago

@huchenlei How do I enable debug logging so I can investigate a few things?

Thanks!

huchenlei commented 1 year ago

You need to add --controlnet-loglevel DEBUG as commandline args.

If you want the debug logging of how long each function takes in Script you will need to checkout

If you just want to understand how long a specific part of the code takes, you can just add some log messages to the part you are interested in. No need to do any of the things mentioned above.

lllyasviel commented 1 year ago

this? https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11063

marcsyp commented 1 year ago

@lllyasviel nope -- i've not upgraded my nvidia drivers, still on 528. This is definitely related to cropping/masking code inefficiencies, and probably other things as well but would require investigation.

Vendaciousness commented 1 year ago

I see you're using a VERY old version of Torch.
Your version shows as torch: 1.13.1+cu117 I've never see a version this old, but I can tell you that when I upgraded to version 2.01+cu118, my speeds about doubled, just from that. You can update to version 2 just by updating your Automatic1111 with the git pull command in your WebUI directory, or by deleting your venv folder and restarting Automatic1111 (which will replace it with the latest stuff). You can also get a boost from replacing your CuDNN binaries with the latest ones from Nvidia's Developer site.

If you do these upgrades, please post your new speeds. It would be interesting to see the difference.

-V

marcsyp commented 1 year ago

@Vendaciousness -- thanks for the comment. Interesting that you got such a great performance improvement. Back when A1111 updated the torch version, I did indeed update to 2 and had a variety of nasty side effects. One was the forever hanging generations that required noodling with live preview settings, and the other was that my performance actually got slower with torch 2. Perhaps I didn't have the correct pairing with nVidia drivers and CuDNN binaries, but honestly when I encounter issues like that in the middle of an important project in production (which I have been working on for 9 weeks now), I hesitate to go "all in" on an upgrade process that may end in tears and frustration and the need to go all the way back to square one to retrieve an old configuration.

As such, I forked the A1111 repo and have my own custom configuration, which I am willing to abandon pretty soon, as I am reaching the end of a pretty intense production run and can afford to get off track for a bit. I will give it a shot (this time backing up my venv folder) and I will post results here!

Incidentally, do you have any recommendations on specifics around updating the CuDNN binaries? I haven't done that in the A1111 environment yet, so I don't know if there are any idiosyncracies... for isntance, do I need the latest nvidia drivers? And if so, weren't there some issues with those recently that were forcing users to roll back?

Marc

marcsyp commented 1 year ago

Incidentally, here is where I am right now. See any red flags?

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 528.24 Driver Version: 528.24 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... WDDM | 00000000:09:00.0 On | N/A | | 30% 39C P8 37W / 350W | 8069MiB / 12288MiB | 5% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

marcsyp commented 1 year ago

Yeah, so just upgraded to torch 2 and I'm getting pretty similar or slightly worse performance:

Canva size: 13824x18432 Image size: 4608x6144 Scale factor: 3 Upscaling iteration 1 with scale factor 3 Tile size: 1732x1024 Tiles amount: 144 Grid: 18x8 Redraw enabled: True Seams fix mode: NONE Image Width: 13824 Tile Width: 1856 Image Height: 18432 Tile Height: 1088 2023-06-24 03:03:17,914 - ControlNet - INFO - Loading model: control_v11f1e_sd15_tile [a371b31b] 2023-06-24 03:03:18,873 - ControlNet - INFO - Loaded state_dict from [C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_v11f1e_sd15_tile.pth] 2023-06-24 03:03:18,874 - ControlNet - INFO - Loading config: C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_v11f1e_sd15_tile.yaml 2023-06-24 03:03:22,966 - ControlNet - INFO - ControlNet model control_v11f1e_sd15_tile [a371b31b] loaded. 2023-06-24 03:03:31,431 - ControlNet - INFO - Loading preprocessor: tile_resample 2023-06-24 03:03:31,431 - ControlNet - INFO - Pixel Perfect Computation: 2023-06-24 03:03:31,432 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE 2023-06-24 03:03:31,432 - ControlNet - INFO - raw_H = 1088 2023-06-24 03:03:31,432 - ControlNet - INFO - raw_W = 1856 2023-06-24 03:03:31,432 - ControlNet - INFO - target_H = 1088 2023-06-24 03:03:31,432 - ControlNet - INFO - target_W = 1856 2023-06-24 03:03:31,432 - ControlNet - INFO - estimation = 1088.0 2023-06-24 03:03:31,433 - ControlNet - INFO - preprocessor resolution = 1088 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:13<00:00, 1.39s/it] could not find upscaler named , using None as a fallback | 10/1440 [00:11<27:53, 1.17s/it] 2023-06-24 03:04:15,989 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2023-06-24 03:04:25,683 - ControlNet - INFO - Loading preprocessor: tile_resample 2023-06-24 03:04:25,683 - ControlNet - INFO - Pixel Perfect Computation: 2023-06-24 03:04:25,683 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE 2023-06-24 03:04:25,683 - ControlNet - INFO - raw_H = 1088 2023-06-24 03:04:25,683 - ControlNet - INFO - raw_W = 1856 2023-06-24 03:04:25,683 - ControlNet - INFO - target_H = 1088 2023-06-24 03:04:25,684 - ControlNet - INFO - target_W = 1856 2023-06-24 03:04:25,684 - ControlNet - INFO - estimation = 1088.0 2023-06-24 03:04:25,684 - ControlNet - INFO - preprocessor resolution = 1088 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:13<00:00, 1.39s/it] could not find upscaler named , using None as a fallback | 20/1440 [01:01<39:25, 1.67s/it] 2023-06-24 03:04:56,150 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2023-06-24 03:05:04,261 - ControlNet - INFO - Loading preprocessor: tile_resample 2023-06-24 03:05:04,262 - ControlNet - INFO - Pixel Perfect Computation: 2023-06-24 03:05:04,262 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE 2023-06-24 03:05:04,263 - ControlNet - INFO - raw_H = 1088 2023-06-24 03:05:04,263 - ControlNet - INFO - raw_W = 1856 2023-06-24 03:05:04,263 - ControlNet - INFO - target_H = 1088 2023-06-24 03:05:04,264 - ControlNet - INFO - target_W = 1856 2023-06-24 03:05:04,264 - ControlNet - INFO - estimation = 1088.0 2023-06-24 03:05:04,265 - ControlNet - INFO - preprocessor resolution = 1088 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00, 1.42s/it] could not find upscaler named , using None as a fallback | 30/1440 [01:40<37:25, 1.59s/it] 2023-06-24 03:05:39,224 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2023-06-24 03:05:47,204 - ControlNet - INFO - Loading preprocessor: tile_resample 2023-06-24 03:05:47,204 - ControlNet - INFO - Pixel Perfect Computation: 2023-06-24 03:05:47,205 - ControlNet - INFO - resize_mode = ResizeMode.RESIZE 2023-06-24 03:05:47,205 - ControlNet - INFO - raw_H = 1088 2023-06-24 03:05:47,205 - ControlNet - INFO - raw_W = 1856 2023-06-24 03:05:47,205 - ControlNet - INFO - target_H = 1088 2023-06-24 03:05:47,205 - ControlNet - INFO - target_W = 1856 2023-06-24 03:05:47,205 - ControlNet - INFO - estimation = 1088.0 2023-06-24 03:05:47,205 - ControlNet - INFO - preprocessor resolution = 1088 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00, 1.28s/it] could not find upscaler named , using None as a fallback | 40/1440 [02:20<35:54, 1.54s/it] 2023-06-24 03:06:15,619 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2023-06-24 03:06:23,221 - ControlNet - INFO - Loading preprocessor: tile_resample

Image gen denoising is trending more toward 13-14s instead of 12-13, 9-10s of tile model loading from cache vs 8-9 in old torch.
Non-scientific test here, could be within the margin of error.

I noticed the cudnn binaries in torch/lib are pretty old, and it's hard to tell but they seem like they were build on CUDA 11. My system has CUDA 12 installed. Could this be a problem? If I replace the dlls in the lib folder from nVidia's website, should I use the bin files from v11 or v12? Are there any other files that need to be replaced?

TIA

marcsyp commented 1 year ago

@Vendaciousness

So after updating my binaries, there is similar performance on denoising and the problematic tile cropping code to my previous results on torch 1 -- however, some of the other intermediary processes seem to have been sped up, and my total time has gone down from 97m to about 89m. So that's positive. Plus, now I'm up to date with A1111 and torch 2, so it's a success in general if not a success related to the issue posted here.

There is still a great opportunity to optimize the tile cropping process, it seems. @huchenlei -- anythoughts on how we might get this to proceed? I know there are a lot of priorities right now, but this could have a pretty great impact on performance system wide, even for large batches of smaller operations. I'm headed to Europe for a few weeks but I'm happy to help out where I can when I get back.

Thanks all.

Marc

huchenlei commented 1 year ago

I am working on expanding test coverage right now. Without enough test coverage, I don't think I have enough confidence to tackle the convoluted mask code in Script.process.

linjunshi commented 1 year ago

Not sure if this is also related to the issue, but there was a time that I updated the checkpoint cache number and ControlNet model cache number, and the time that CN used to take loading the model and preprocessors between each tile was just gone, the tiles are processed immediately after each other, even with large images like 10000*10000, but then after a while, it went back to the same behaviour like it was before: waited quite a few seconds loading models between tiles. I can't reproduce that behaviour so there could be some information missing. Just thought maybe this info could be helpful? And also looking for solutions to this as well. Cheers

I am using 4090 and torch 2+, CUDA 12.1.

Vendaciousness commented 1 year ago

@marcsyp Sorry for the delay, I've been in the middle of a crazy project crunch time. Anyways, regarding the cudNN binaries, I would install the latest CUDA Toolkit, here: https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64

Automatic1111 already has a fairly new version of the CudNN binaries, nowadays, but if you want the latest ones, you need to create a developer account on Nvidia to download the latest binaries, here: https://developer.nvidia.com/rdp/cudnn-download

Then you copy the files located in the bin folder and overwrite the ones in your venv folder, in [Auto1111 folder]\venv\site-packages\torch\Lib\

I would in general recommend you do what I do and create a second install of Auto1111, perhaps you might try Vlad's automatic, which is kind of like a souped-up race car version of Auto1111. It's usually faster than my install of Auto1111, but bear in mind it has more bugs as a result.

The second install uses the models/checkpoint folders of the first, so there's only a few extra gigs of extra files on the drive. I also copy over my ui-config.json, config.json, params.txt and the webui-user.bat files from my main install, so both copies are set up the same way, unless I need something set up special for a project.

Anyways, I'll usually run the second version, unless it keeps crashing, then I move to the backup primary and when the second, newer version works great for a month or more, I copy it over and make it my new primary. It's basically a fork, however, this fork is done for the purposes of redundancy and so that I'll always have a production copy that works. Lately, I've had 3 versions I run, an LTS, an LTS candidate and a bleeding-edge version (usually Vlad's automatic).

Hope this helps!

marcsyp commented 3 months ago

@huchenlei --

I am returning the original art project that surfaced this issue, having upgraded my webuiUI to 1.6.0 and controlnet to 1.1.443. The preprocessor performance has gotten significantly WORSE for the same exact 3x upscale described in the original issue:

Canva size: 13824x18432 Image size: 4608x6144 Scale factor: 3 Upscaling iteration 1 with scale factor 3 Tile size: 1732x1024 Tiles amount: 144 Grid: 18x8 Redraw enabled: True Seams fix mode: NONE Image Width: 13824 Tile Width: 1856 Image Height: 18432 Tile Height: 1088 2024-05-29 22:22:01,272 - ControlNet - INFO - unit_separate = False, style_align = False 2024-05-29 22:22:01,634 - ControlNet - INFO - Loading model: control_v11f1e_sd15_tile [a371b31b] 2024-05-29 22:22:03,575 - ControlNet - INFO - Loaded state_dict from [C:\SDW\stable-diffusion-webui\extensions\sd-webui-controlnet\models\control_v11f1e_sd15_tile.pth] 2024-05-29 22:22:03,576 - ControlNet - INFO - controlnet_default_config 2024-05-29 22:22:06,503 - ControlNet - INFO - ControlNet model control_v11f1e_sd15_tile a371b31b loaded. 2024-05-29 22:22:15,878 - ControlNet - INFO - Using preprocessor: tile_resample 2024-05-29 22:22:15,878 - ControlNet - INFO - preprocessor resolution = 1088 2024-05-29 22:22:16,002 - ControlNet - INFO - ControlNet Hooked - Time = 14.733000755310059 altprompter before process batch space explorer after extra networks activate 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00, 1.23s/it] 2024-05-29 22:23:14,953 - ControlNet - INFO - unit_separate = False, style_align = False 2024-05-29 22:23:14,954 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2024-05-29 22:23:37,918 - ControlNet - INFO - Using preprocessor: tile_resample 2024-05-29 22:23:37,918 - ControlNet - INFO - preprocessor resolution = 1088 2024-05-29 22:23:38,085 - ControlNet - INFO - ControlNet Hooked - Time = 23.135815620422363 altprompter before process batch space explorer after extra networks activate 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:11<00:00, 1.15s/it] 2024-05-29 22:24:37,067 - ControlNet - INFO - unit_separate = False, style_align = False 2024-05-29 22:24:37,067 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2024-05-29 22:24:46,759 - ControlNet - INFO - Using preprocessor: tile_resample 2024-05-29 22:24:46,760 - ControlNet - INFO - preprocessor resolution = 1088 2024-05-29 22:24:46,852 - ControlNet - INFO - ControlNet Hooked - Time = 9.78799557685852 altprompter before process batch space explorer after extra networks activate 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:12<00:00, 1.22s/it] 2024-05-29 22:25:29,840 - ControlNet - INFO - unit_separate = False, style_align = False 2024-05-29 22:25:29,840 - ControlNet - INFO - Loading model from cache: control_v11f1e_sd15_tile [a371b31b] 2024-05-29 22:25:55,961 - ControlNet - INFO - Using preprocessor: tile_resample 2024-05-29 22:25:55,962 - ControlNet - INFO - preprocessor resolution = 1088 2024-05-29 22:25:56,067 - ControlNet - INFO - ControlNet Hooked - Time = 26.231940031051636 altprompter before process batch space explorer after extra networks activate 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:11<00:00, 1.16s/it] space explorer postprocess batch list could not find upscaler named , using None as a fallback

CN hook times vary wildly from 9s to 26s. Any idea what is going on here? Have there been any updates in A1111 past 1.6.0 that may help this issue? I'm reluctant to update in the middle of this project, but I would consider it.

Vendaciousness commented 3 months ago

I'm reluctant to update in the middle of this project, but I would consider it.

Can't blame you, there. If I wanted to process images to those specs, I wouldn't even use A1111 anymore, but rather Stable Forge, an optimized version of A1111 made by the creator of ControlNet,. Just be careful to create a new install for SF. Don't ruin your copy of A1111 by 'upgrading' it with Stable Forge update instructions they have there. It just broke my existing install and all the extensions were different, anyway.

Stable Forge: https://github.com/lllyasviel/stable-diffusion-webui-forge

Or, if I needed the images to be extremely detailed (performance over speed), maybe SUPIR, a new upscale method that gives the best results I've seen.

SUPIR: https://github.com/Fanghua-Yu/SUPIR

marcsyp commented 3 months ago

Thanks @Vendaciousness -- a couple people have pointed me to Stable Forge and I will definitely consider it, particularly as a completely fresh install. The reason I haven't switched so far is because my Upscale workflow relies on a heavily developed fork of Ultimate Upscale, and given my cursory reading of the Forge documentation, I was concerned that some modification of my plugin would be required to get it working in SF, with no guarantee of success.

I'm also stuck on torch 1.13 for this project because image reproduction changed significantly moving to torch 2 (and therefore destroying my upscale workflow, which relies on reproducible seeds), so I'm not sure if I'm ready to go down that experimentation route just yet. But I will keep it in the back pocket if I get frustrated. Thanks for the heads up on SUPIR, this is the first I'm seeing it.

Vendaciousness commented 3 months ago

That's rough. I've had my own custom workflow for seamless looping and seamless tiling AI videos broken for many months as a result of version mismatches, so I fully relate. The most frustrating thing is it definitely worked in the past, until it broke one day and I was too busy to trace down the cause at the time.

I'm trying to use SF for some super high res (20k+ per side) upscales right now, so if you want to add me on Discord, I'll share what I learn. Maybe I can adjust my workflow to replicate your requirements. I've used Ultimate SD Upscaler in the past, but had better results using the 'Multidiffusion and Tiled VAE' extension, which I think uses the same tiling method to slice up the job into bite-sized chunks. Illyasviel has built this into Stable Forge, along with HyperTile, Kohya's HR (high res) Fix and some other potentially useful tools, so it could be a good fit, but bear in mind it's a stale repo. The dev is brilliant, but super flaky. Can confirm extension compatibility is hit and miss, as you've read, though older (1.6-1.8) compatible versions may be fine. That is, I don't believe there is an inbuilt incompatibility.

Add me if you want: digitalhitman@contractor.net

-v

marcsyp commented 3 months ago

Yeah, I made the mistake of not cloning my venv and keeping detailed records of plugins/settings early on in the project, so I have no way to return completely to my original state. Frustrating but then it was all back in the early days before configs were even properly implemented, and we were all just experimenting. I just didn't realize that the project would have legs and that I would continue to want to work on it for so long, and as a result I'm on a pseudo workable but not ideal solution right now.

The biggest part of the workflow I built in my private fork is custom seed, CFG, and prompt extraction and manipulation as part of the upscale process, and I can't really live without it for at least parts of the process when doing batch operations -- but for final upscales on individual pieces, if I can find a good workflow for ultra large upscales, I'm open to it, so yeah I'd love the hear about what you learn. I'll add you on discord in the off hours. Cheers

Vendaciousness commented 3 months ago

Here's a more recent ComfyUI-based upscale method that combines SUPIR and some others, so if you'd rather learn ComfyUI (it's a steep learning curve, but much more capable), check it out: https://github.com/dicksondickson/dickson-sci-fi-enhance-upscale