AUTOMATIC1111 / stable-diffusion-webui-tensorrt

MIT License
311 stars 20 forks source link

Only 75 Tokens possible? #28

Open Devalinor opened 1 year ago

Devalinor commented 1 year ago

No matter which settings I choose, the only thing that's working is the minimum and maximum resolution, but sadly not the maximum amount of tokens. Is this already known, or is something wrong on my side?

Activating unet: [TRT] breakdomainanime_A0440
[06/01/2023-07:36:11] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
[06/01/2023-07:36:11] [TRT] [W] TensorRT was linked against cuDNN 8.9.0 but loaded cuDNN 8.7.0
[06/01/2023-07:36:11] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 30.63it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 25.64it/s]
  0%|                                                                                           | 0/20 [00:00<?, ?it/s][06/01/2023-07:36:41] [TRT] [E] 3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::validateInputBindings::2083] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::validateInputBindings::2083, condition: profileMinDims.d[i] <= dimensions.d[i]. Supplied binding dimension [1,4,64,64] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 2, minimum dimension in profile is 2, but supplied dimension is 1.
)
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]
Error completing request
Arguments: ('task(cp1157sj3td4zb3)', '456  6453564 7365465 252345 745654 25 744352 4325523 6423 43526 364523 12124 634345 12431234 ^32^13 345 ', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 0, '', '', [], 0, False, 7, 100, 'Constant', 0, 'Constant', 0, 4, <controlnet.py.UiControlNetUnit object at 0x0000022AA9AF4A60>, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, None, False, 50) {}
Traceback (most recent call last):
  File "F:\stable-diffusion-webui - Kopie\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "F:\stable-diffusion-webui - Kopie\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\modules\txt2img.py", line 57, in txt2img
    processed = processing.process_images(p)
  File "F:\stable-diffusion-webui - Kopie\modules\processing.py", line 611, in process_images
    res = process_images_inner(p)
  File "F:\stable-diffusion-webui - Kopie\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\modules\processing.py", line 731, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
  File "F:\stable-diffusion-webui - Kopie\modules\processing.py", line 979, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "F:\stable-diffusion-webui - Kopie\modules\sd_samplers_kdiffusion.py", line 433, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "F:\stable-diffusion-webui - Kopie\modules\sd_samplers_kdiffusion.py", line 275, in launch_sampling
    return func()
  File "F:\stable-diffusion-webui - Kopie\modules\sd_samplers_kdiffusion.py", line 433, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\modules\sd_samplers_kdiffusion.py", line 174, in forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
  File "F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "F:\stable-diffusion-webui - Kopie\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "F:\stable-diffusion-webui - Kopie\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\modules\sd_unet.py", line 89, in UNetModel_forward
    return current_unet.forward(x, timesteps, context, *args, **kwargs)
  File "F:\stable-diffusion-webui - Kopie\extensions\stable-diffusion-webui-tensorrt\scripts\trt.py", line 86, in forward
    self.infer({"x": x, "timesteps": timesteps, "context": context})
  File "F:\stable-diffusion-webui - Kopie\extensions\stable-diffusion-webui-tensorrt\scripts\trt.py", line 69, in infer
    self.allocate_buffers(feed_dict)
  File "F:\stable-diffusion-webui - Kopie\extensions\stable-diffusion-webui-tensorrt\scripts\trt.py", line 63, in allocate_buffers
    raise Exception(f'bad shape for TensorRT input {binding}: {tuple(shape)}')
Exception: bad shape for TensorRT input x: (1, 4, 64, 64)
MoreColors123 commented 1 year ago

Tokenmax needs to be multiples of 75. I successfully made a min 75 max 225 .trt with batch size 4 and 768to768 (same value min and max!) width and 448to448 height. It's the nearest to 16:9.

Devalinor commented 1 year ago

Choosing a higher value than 75 is no problem, the model will be converted without any issues. But using more than 75 tokens in the prompt window don't seem to be possible at the moment. I've tried your settings, and I am still getting the same results.

wizz13150 commented 1 year ago

Exception: bad shape for TensorRT input x: (1, 4, 64, 64) seems suspect to me. The 1 should be 2. Cause the min batch size is 1, and the equation take batch_size * 2. So should be 2x4x64x64 (1x2 and 64x8=512pixels here)

image

So not sure how you went here. Did you run the command manually to convert the model or ? Look at my PR, the max token seems to have absolutely no impact here, and you should see the limit. And also convert as batch.

It's mentionning 'controlnet' at some point, it won't work if you use it for sure. You should also disable your extension and non-stock settings, and see. These models can't run any features like you was before. Start from a fresh stock webui. Need to wait Nvidia's next releases now.

Devalinor commented 1 year ago

Exception: bad shape for TensorRT input x: (1, 4, 64, 64) seems suspect to me. The 1 should be 2. Cause the min batch size is 1, and the equation take batch_size * 2. So should be 2x4x64x64 (1x2 and 64x8=512pixels here)

image

So not sure how you went here. Did you run the command manually to convert the model or ? Look at my PR, the max token seems to have absolutely no impact here, and you should see the limit. And also convert as batch.

It's mentionning 'controlnet' at some point, it won't work if you use it for sure. You should also disable your extension and non-stock settings, and see. These models can't run any features like you was before. Start from a fresh stock webui. Need to wait Nvidia's next releases now.

Is it possible that it's because of these errors when trying to convert a model to onnx?

F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\einops\einops.py:314: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  known = {axis for axis in composite_axis if axis_name2known_length[axis] != _unknown_axis_length}
F:\stable-diffusion-webui - Kopie\venv\lib\site-packages\einops\einops.py:315: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  unknown = {axis for axis in composite_axis if axis_name2known_length[axis] == _unknown_axis_length}
F:\stable-diffusion-webui - Kopie\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py:158: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
F:\stable-diffusion-webui - Kopie\modules\sd_hijack_unet.py:26: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if a.shape[-2:] != b.shape[-2:]:
F:\stable-diffusion-webui - Kopie\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py:109: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
ellugia commented 1 year ago

Same here, converted the models with a multiple of 75 since those are the only values allowed but cannot generate with more than 75 tokens.

jebarpg commented 1 year ago

I made a fix for all these issues. You can check out my fork with the changes here: https://github.com/jebarpg/stable-diffusion-webui-tensorrt I did all the manual testing and discovered the limits of all the shapes you can create with max width, height and batch sizes. The best batch size and max width and height I have found is bs: 7 maxW: 512 maxH: 512. You can max out the max tokens it has no effect on the shape size limit only the max batch size, max width, and max height have any effect. I also discovered a base number per every batch size from 1 to 11 which lets you know how far you can slide the max width and height. You will get a red label signaling that you are over the limit and green otherwise. I've created a pull request so hopefully it gets integrated in. Also you can do batch processing instead of just one model at a time. for both the onnx files and trt files. NOTE that your settings for max width height batch size tokens etc will apply to the entire batch. Let me know what you all think.

CyberTimon commented 1 year ago

I'm on linux but when I enter more than 75 tokens (Also the same error with @jebarpg 's branch): 0%| | 0/20 [00:00<?, ?it/s] webui.sh: line 241: 161938 Segmentation fault (core dumped) "${python_cmd}" "${LAUNCH_SCRIPT}" "$@"

Hope there is a fix soon!

buenomsg commented 1 year ago

@jebarpg I'm getting the same error that you related, even after setting your suggestion. May I ask you how do you found that math apllied for the limits. I'm trying to understand, but I just find a few cloes and I'm not understanding yet.

jebarpg commented 1 year ago

@CyberTimon @buenomsg yea I too can't get over 75 tokens.. I've been waiting on NVIDIA to get back to me about how exactly they calculate their shape when using trtexec.

@buenomsg so like what @Devalinor mentioned in the code here: https://github.com/AUTOMATIC1111/stable-diffusion-webui-tensorrt/blob/5e5352c5497aeefc3a4b33a4acde557eac7a21c5/export_trt.py#L22 you can see how they calculate the different min's and max's.

As for how I discovered the limits I went through and tested out as many settings as I could with different batch sizes. So I set max batch size to 1 and then tested different combinations of max_width an max_height and kept a spread sheet of which ones worked and which ones did not. I noticed that there was a limit to the maxSize value which once you went over it would error out. I did this for batch sizes 1-11, but after 7 it became useless because you couldn't have both width and height at 512 or above anymore, one had to be I think it was 480 and 512 in order for batch size 8 to work... so batch size 8 was useless IMO... but at batch size 7 you could do exactly 512 x 512 which is ideal (given the limitations of course). I also calculated their shape total value which is how I came up with this table here:

batch_sizes = { 1: 92160, 2: 129024, 3: 159744, 4: 184320, 5: 184320, 6: 221184, 7: 229376, 8: 229376, 9: 276480, 10: 286720, 11: 281600 }

to calculate the shape total size I took the max_shape which you can figure out from using:

B = max_batch_size * 2 unknown = 4 H = max_height / 8 W = max_width / 8

maxShape=BxunknownxHxW (i.e. 4x4x64x64 which = 65536)

I kept the spread sheet and incrementally tried as many combinations as needed to find the max width and height combo that would generate the larges maxShape value and still convert to a trt model.

So it was a lot of manual work and a bit of mathematical work combined.

If NVIDIA could tell me how they do their shape calculates I could have figured it out without all the manual fiddling.

left1000 commented 11 months ago

This issue might just be out of date, but I just generated an image with 190 in text prompt. So I suppose at some point this issue was solved? (I did generate the profile at 225 text setting.)