SDXL refiner not support

daxijiu commented 1 year ago

can't generate SDXL refiner model engines. PS: SDXL base+refiner workflow,How should sd_unet be configured?

Exporting sd_xl_refiner_1.0_0.9vae to TensorRT {'sample': [(2, 4, 128, 128), (2, 4, 128, 128), (2, 4, 128, 128)], 'timesteps': [(2,), (2,), (2,)], 'encoder_hidden_states': [(2, 77, 2048), (2, 77, 2048), (2, 77, 2048)], 'y': [(2, 2816), (2, 2816), (2, 2816)]} No ONNX file found. Exporting ONNX... Disabling attention optimization D:\kkkkk\release\SD_webui_with_aki_launcher_dev\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py:987: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert y.shape[0] == x.shape[0] ERROR:root:mat1 and mat2 shapes cannot be multiplied (2x2816 and 2560x1536) Traceback (most recent call last): File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 84, in export_onnx torch.onnx.export( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\onnx\utils.py", line 516, in export _export( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\onnx\utils.py", line 1596, in _export graph, params_dict, torch_out = _model_to_graph( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\onnx\utils.py", line 1135, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\onnx\utils.py", line 1011, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\onnx\utils.py", line 915, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\jit_trace.py", line 1285, in _get_trace_graph outs = ONNXTracedModule( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\jit_trace.py", line 133, in forward graph, out = torch._C._create_graph_by_tracing( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\jit_trace.py", line 124, in wrapper outs.append(self.inner(trace_inputs)) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward result = self.forward(*input, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\modules\sd_unet.py", line 91, in UNetModel_forward return original_forward(self, x, timesteps, context, *args, *kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 988, in forward emb = emb + self.label_emb(y) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward result = self.forward(*input, *kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\container.py", line 215, in forward input = module(input) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward result = self.forward(*input, *kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\container.py", line 215, in forward input = module(input) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\module.py", line 1508, in _slow_forward result = self.forward(input, **kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\extensions-builtin\Lora\networks.py", line 472, in network_Linear_forward return originals.Linear_forward(self, input) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2816 and 2560x1536)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, **kwargs) File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 135, in export_unet_to_trt export_onnx( File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 129, in export_onnx exit() File "D:\kkkkk\release\SD_webui_with_aki_launcher_dev\python\lib_sitebuiltins.py", line 26, in call raise SystemExit(code) SystemExit: None

chazzhou commented 11 months ago

Hi, I've set up an experimental framework for refiner support, available at this link. This setup stores used engines in memory, which typically requires a 24GB graphics card to effectively run the refiner. Both models use approximately 15GB of VRAM, and two engines consume an additional ~4GB. Although you can try using the low/medium RAM flag, I haven't tested this myself.

To begin, you need to build the engine for the base model. For using the refiner, choose it as the Stable Diffusion checkpoint, then proceed to build the engine as usual in the TensorRT tab. Once the engine is built, refresh the list of available engines. Next, select the base model for the Stable Diffusion checkpoint and the Unet profile for your base model. After that, enable the refiner in the usual way. That's all.

Please note, there's a known memory leak or arrangement issue. If you're using a 24GB card (like the 3090/4090), you'll need to comment out line 708 send_model_to_device(already_loaded) in the modules/sd_models.py file to enable proper inference.

For those with graphics cards exceeding 24GB memory (in my experience, it uses about 25-28GB during inference), you can avoid commenting out the line since your system should have sufficient memory to spare. However, I plan to address these memory issues in the future. Ideally, the system should not require more than approximately 19GB with both models and engines loaded.

phineas-pta commented 11 months ago

for refiner edit this line:

https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/blob/main/models.py#L1068

change value from 2816 to 2560

NVIDIA / Stable-Diffusion-WebUI-TensorRT

SDXL refiner not support #89