comfyanonymous / ComfyUI_TensorRT

MIT License
461 stars 30 forks source link

How to run SD3 with Tensor RT? #45

Open Jay19751103 opened 3 months ago

Jay19751103 commented 3 months ago

I convert the TRT engine with following https://github.com/yusing/ComfyUI_TensorRT/blob/master/workflows/Build.TRT.Engine_SDXL_Base_Static.json and vae use following to generate safetensors https://github.com/yusing/ComfyUI_TensorRT/blob/master/workflows/Save_SD3_VAE.json

then use the following to launch https://github.com/yusing/ComfyUI_TensorRT/blob/master/workflows/Create_SD3_TRT_Static.json It will have errors Traceback (most recent call last): File "C:\Users\wenchien\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\Users\wenchien\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\Users\wenchien\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "C:\Users\wenchien\ComfyUI\nodes.py", line 1371, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "C:\Users\wenchien\ComfyUI\nodes.py", line 1341, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "C:\Users\wenchien\ComfyUI\comfy\sample.py", line 43, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 795, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 697, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 684, in sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 663, in inner_sample samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 568, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) File "C:\Users\wenchien\AppData\Local\anaconda3\envs\trt\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "C:\Users\wenchien\ComfyUI\comfy\k_diffusion\sampling.py", line 599, in sample_dpmpp_2m denoised = model(x, sigmas[i] * s_in, *extra_args) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 291, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 650, in call return self.predict_noise(args, kwargs) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 653, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 277, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) File "C:\Users\wenchien\ComfyUI\comfy\samplers.py", line 226, in calc_cond_batch output = model.apply_model(inputx, timestep, c).chunk(batch_chunks) File "C:\Users\wenchien\ComfyUI\comfy\model_base.py", line 113, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() File "C:\Users\wenchien\ComfyUI\custom_nodes\ComfyUI_TensorRT\tensorrt_loader.py", line 69, in call self.set_bindings_shape(model_inputs, curr_split_batch) UnboundLocalError: local variable 'curr_split_batch' referenced before assignment

nikolatesla20 commented 3 months ago

Why did you use SDXL_base for your tensor conversion when you said you wanted to use SD3?

Jay19751103 commented 3 months ago

There is no SD3 workflow files. I check the network resource which said use SDXL base to convert it and the model type use sd3. Am I wrong ? and do you have any example to convert this? I also try to test the converted onnx with random input. It will have following errors 2024-07-01 09:35:15.3850222 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running Add node. Name:'/unet/Add_4' Status Message: D:\a_work\1\s\onnxruntime\core/providers/cpu/math/element_wise_ops.h:560 onnxruntime::BroadcastIterator::Append axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 25600 by 262144

I still have no idea on this.

and trying huggingface onnx files https://huggingface.co/stabilityai/stable-diffusion-3-medium-tensorrt/tree/main/mmdit.opt 2024-07-01 09:55:41.1541677 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running Add node. Name:'/diffusion_model/Add_2' Status Message: D:\a_work\1\s\onnxruntime\core/providers/cpu/math/element_wise_ops.h:560 onnxruntime::BroadcastIterator::Append axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 25600 by 262144

Test program to run huggingface onnx import numpy as np import onnxruntime as ort

EP_list = ['DmlExecutionProvider','CPUExecutionProvider'] sess_opt = ort.SessionOptions() sess_opt.log_severity_level = 0 sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_BASIC sess = ort.InferenceSession("model.onnx", sess_opt, providers=EP_list)

input_name1 = sess.get_inputs()[0].name input_name2 = sess.get_inputs()[1].name input_name3 = sess.get_inputs()[2].name input_name4 = sess.get_inputs()[3].name

output_name = sess.get_outputs()[0].name outputs_shape1 = sess.get_outputs()[0].shape

input_shape1 = sess.get_inputs()[0].shape input_shape2 = sess.get_inputs()[1].shape input_shape3 = sess.get_inputs()[2].shape input_shape4 = sess.get_inputs()[3].shape

batch_size = 2

dummy_input1 = np.random.random((batch_size, 16 ,1024,1024)).astype(np.float16) dummy_input2 = np.random.random(batch_size).astype(np.float16) dummy_input3 = np.random.random((batch_size, 154, 4096)).astype(np.float16) dummy_input4 = np.random.random((batch_size, 2048)).astype(np.float16)

np.put(dummy_input1, 22, -5, mode='clip')

np.put(dummy_input2, 22, -5, mode='clip')

np.put(dummy_input3, 22, -5, mode='clip')

np.put(dummy_input4, 22, -5, mode='clip')

prediction = sess.run([output_name], {input_name1 : dummy_input1, input_name2 : dummy_input2, input_name3 : dummy_input3, input_name4 : dummy_input4})[0]