AUTOMATIC1111 / stable-diffusion-webui-tensorrt

MIT License
310 stars 20 forks source link

from what I can tell install.py requires importlib_metadata but pip is installing importlib_metadata by the name importlib-metadata and so it can't be found? also is my cpu being treated as a secondary cuda choice and that is ruining the code? is there anyway to tell this extension to ignore my cpu's built in graphics functionalities that I never use? #69

Open left1000 opened 8 months ago

left1000 commented 8 months ago

from what I can tell install.py requires importlib_metadata but pip is installing importlib_metadata by the name importlib-metadata and so it can't be found?

That's just my guess as to what is wrong. Below is the literal console output I'm basing my guess on. Please help. I really wanna get trt installed and try it out!

WARNING:xformers:A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( Error running install.py for extension V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT. Command: "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\Scripts\python.exe" "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\install.py" Error code: 1 stderr: Traceback (most recent call last): File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\install.py", line 2, in from importlib_metadata import version ModuleNotFoundError: No module named 'importlib_metadata' Launching Web UI with arguments: --medvram --no-half-vae --censoredthispart --xformers --ckpt-dir V:\AI images stuff\checkpoint Error loading script: trt.py Traceback (most recent call last): File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\modules\scripts.py", line 382, in load_scripts script_module = script_loading.load_module(scriptfile.path) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module module_spec.loader.exec_module(module) File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\scripts\trt.py", line 10, in import ui_trt File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 10, in from exporter import export_onnx, export_trt File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 3, in import onnx ModuleNotFoundError: No module named 'onnx'

So, I just edited install.py so it doesn't import version and doesn't use version, ran it, it "worked" or at least there were no errors, but when I went to the tab and clicked the orange export button I got this in the console

No ONNX file found. Exporting ONNX... Disabling attention optimization ============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 ============= verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

ERROR:root:Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm) Traceback (most recent call last): File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 84, in export_onnx torch.onnx.export( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\onnx\utils.py", line 506, in export _export( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\onnx\utils.py", line 1548, in _export graph, params_dict, torch_out = _model_to_graph( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\onnx\utils.py", line 1113, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\onnx\utils.py", line 989, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\onnx\utils.py", line 893, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\jit_trace.py", line 1268, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\jit_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\jit_trace.py", line 118, in wrapper outs.append(self.inner(trace_inputs)) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward result = self.forward(*input, kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\modules\sd_unet.py", line 91, in UNetModel_forward return ldm.modules.diffusionmodules.openaimodel.copy_of_UNetModel_forward_for_webui(self, x, timesteps, context, *args, kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 789, in forward emb = self.time_embed(t_emb) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward result = self.forward(input, kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward result = self.forward(input, **kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 429, in network_Linear_forward return originals.Linear_forward(self, input) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, **kwargs) File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 135, in export_unet_to_trt export_onnx( File "V:\AI images stuff\A1111 Web UI Autoinstaller\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 129, in export_onnx exit() File "C:\Python3.10.0\lib_sitebuiltins.py", line 26, in call raise SystemExit(code) SystemExit: None

looks like the software doesn't realize I only care about my gpu and don't want my cpu to do any cuda? potentally this is because I have an intel gpu with integrated graphics (that I've never used and will never use, it's just a standard thing in intel cpu's)....

Am I even describing the problem correctly? I don't have a clue!

if it matters this is my pip show torch

pip show torch Name: torch Version: 1.13.1 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: c:\python3.10.0\lib\site-packages Requires: typing-extensions Required-by: accelerate, fairscale, timm, torchvision, xformers

TLDR: There might be many things wrong, I should maybe give up. But who knows, maybe one of you can save me?

left1000 commented 8 months ago

So, I figured, maybe my installation is messed up, I've been updating it and my pip packages for like 5-10 months now.

So I grabbed https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.0.0-pre to see if I could get that to work. End result? Far less work by me, same exact error.

No ONNX file found. Exporting ONNX... Disabling attention optimization ============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== ERROR:root:Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm) Traceback (most recent call last): File "V:\AI images stuff\automatic1111 prebuilt\webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 84, in export_onnx torch.onnx.export( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\onnx\utils.py", line 506, in export _export( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\onnx\utils.py", line 1548, in _export graph, params_dict, torch_out = _model_to_graph( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\onnx\utils.py", line 1113, in _model_to_graph graph, params, torch_out, module = _create_jit_graph(model, args) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\onnx\utils.py", line 989, in _create_jit_graph graph, torch_out = _trace_and_get_graph_from_model(model, args) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\onnx\utils.py", line 893, in _trace_and_get_graph_from_model trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\jit_trace.py", line 1268, in _get_trace_graph outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, kwargs) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\jit_trace.py", line 127, in forward graph, out = torch._C._create_graph_by_tracing( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\jit_trace.py", line 118, in wrapper outs.append(self.inner(trace_inputs)) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward result = self.forward(*input, kwargs) File "V:\AI images stuff\automatic1111 prebuilt\webui\modules\sd_unet.py", line 91, in UNetModel_forward return ldm.modules.diffusionmodules.openaimodel.copy_of_UNetModel_forward_for_webui(self, x, timesteps, context, *args, kwargs) File "V:\AI images stuff\automatic1111 prebuilt\webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 789, in forward emb = self.time_embed(t_emb) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward result = self.forward(input, kwargs) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\container.py", line 217, in forward input = module(input) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward result = self.forward(input, **kwargs) File "V:\AI images stuff\automatic1111 prebuilt\webui\extensions-builtin\Lora\networks.py", line 429, in network_Linear_forward return originals.Linear_forward(self, input) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "V:\AI images stuff\automatic1111 prebuilt\system\python\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, **kwargs) File "V:\AI images stuff\automatic1111 prebuilt\webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 135, in export_unet_to_trt export_onnx( File "V:\AI images stuff\automatic1111 prebuilt\webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 129, in export_onnx exit() File "_sitebuiltins.py", line 26, in call SystemExit: None

Like before set CUDA_VISIBLE_DEVICES=0 doesn't help

left1000 commented 8 months ago

I deleted these from my command arguments --xformers --medvram --no-half-vae

and the export rt engine worked, not sure which one was ruining it, nor am I sure why my own manual install was getting some random cuda error message popups.....

left1000 commented 8 months ago

it's not perfect though the console has these non-fatal errors:

[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored

[W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading

W] UNSUPPORTED_STATESkipping tactic 0 due to insufficient memory on requested size of 22215426048 detected for tactic 0x0000000000000000

W] Cache result detected as invalid for node: /input_blocks.4/input_blocks.4.0/skip_connection/Conv, LayerImpl: CaskGemmConvolution, tactic: 0x0000000204040190

Have I set something up wrong still? and if so what should I change?

left1000 commented 8 months ago

It looks like the flag that breaks this extension is --medvram

edit: wait no that's not it at all, no clue which flag is what at the moment... what it looks like to me right now is that profiles made under different command flags are invalid?

edit2: Okay so I had it working just fine but trying to figure out what flag was the problem is making it not work at all now, even with the flags removed again.... it says vaguely No valid profile found. Please go to the TensorRT tab and generate an engine with the necessary profile. If using hires.fix, you need an engine for both the base and upscaled resolutions. Otherwise, use the default (torch) U-Net even though it worked a minute ago... hmm

I figured out what cursed me, I ran the non-local-environment .bat which started trying to install things for python in general on my computer, but swapping back to using the correct .bat should've fixed things, yet instead it's just given me new errors, I just had an error that I didn't have 19GB of memory for pytorch, which is a bad sign as my gpu only has 12GB of vram...

I hope this doesn't mean people without 24gb of vram need to use --medvram but --medvram breaks rttensor :(

Although that would be a strange conclusion, since I tested this an hour ago and it worked.

edit amillion: Removing --xformers and --no-half-vae and --medvram let's me build a profile, but readding --xformers and --no-half-vae let's me also build a profile, but won't run an image generation says no valid profile, even when there is.

somehow I am getting

ValueError: No valid profile found. Please go to the TensorRT tab and generate an engine with the necessary profile. If using hires.fix, you need an engine for both the base and upscaled resolutions. Otherwise, use the default (torch) U-Net.

right after building an engine profile... despite like I said, this was working before. Heck, at this point I'm going to just reinstall automatic1111 to hope I've made a mistake at some point along this path.

left1000 commented 8 months ago

Okay, I got it working now.

1) clean install of automatic1111 entirely 2) build profiles 3) generate images all the above done with --medvram off

4) I turn --medvram back on 5) generate image, fine 6) export new profile fine

My only guess as to what I f'd up last time around that inspired my ranting ....

is that --medvram breaks exporting to Unet-onnx but --medvram works fine exporting to Unet-trt

Which if true is great, it would mean adjusting the profiles can be done with medvram flag on. As long as the initial profile was made once with medvram off.

But this is vague and mysterious and I'm guessing wildly. So, I won't close the issue, despite it working for me now.

The steps I took and problems I ran into, can't be right. I must have overlooked a detail.

I'm still paranoid I'll accidentally break it again and have to reinstall for a 4th time.

TLDR: The only sentence to read in this entire issue, if --medvram needs to be off to write the first profile for each model, that should be written down in the instructions somewhere, if I'm wrong, please figure out how I am wrong so I can know.

Thanks.

left1000 commented 8 months ago

I think I figured out what was wrong. In my testing I didn't keep the prompts used perfectly consistent. Many of my errors about invalid profiles may have been due to text token lengths that I wasn't considering carefully enough. Although I think maybe the text token settings are a bit misleading.

From what I can tell min-text is not relevant at all, but max-text will error out and refuse to generate.

So there is optimal-text and max-text, and I'm not sure if optimal-text matters either? But apparently max-text matters and is important.

TLDR: does optimal text or minimum text matter in anyway or does purely max-text matter for token length settings?

left1000 commented 8 months ago

Okay, now I clearly don't know what I'm doing wrong. I can't seem to get it to work at all anymore, my only current guess is that it's related to my constant changing of --medvram and --no-half-vae settings and the profiles made on one flag aren't valid on another flag.

left1000 commented 8 months ago

okay so the optimal text prompt length is just that optimal I guess... and the reasons I was getting no valid profile repeatedly were unrelated somehow...

Things are workingly perfectly now, but they've broken entirely so many times, and every time the only error I get tends to be no valid profile found. Despite the profile existing and being listed.

My only remaining theory is that closing automatic1111 and reopening it breaks every profile, or at least every custom profile.... which would be very odd if true. But I am almost afraid to test it given how long the profiles take to make, and how little strain leaving it endlessly open puts on my machine. But I have an obsessive need to close background programs so I'll end up testing this theory eventually either way.