chengzeyi / Comfy-WaveSpeed

[WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
MIT License
638 stars 19 forks source link

Is this a problem caused by my use of a 3090 graphics card? #26

Open WhiteCrowLX opened 2 weeks ago

WhiteCrowLX commented 2 weeks ago

ComfyUI Error Report

Error Details

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True

2025-01-10T08:11:33.492030 - Traceback (most recent call last): File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1116, in visit_Call return fn(args, extra_kwargs, kws) File "D:\python\lib\site-packages\triton\language\core.py", line 35, in wrapper return fn(args, **kwargs) File "D:\python\lib\site-packages\triton\language\core.py", line 993, in to return semantic.cast(self, dtype, _builder, fp_downcast_rounding) File "D:\python\lib\site-packages\triton\language\semantic.py", line 759, in cast assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89" AssertionError: fp8e4nv data type is not supported on CUDA arch < 89

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1446, in _call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "D:\python\lib\site-packages\torch_dynamo\repro\after_dynamo.py", line 129, in call compiled_gm = compiler_fn(gm, example_inputs) File "D:\python\lib\site-packages\torch__init.py", line 2234, in call return compilefx(model, inputs_, config_patches=self.config) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 1521, in compile_fx return aot_autograd( File "D:\python\lib\site-packages\torch_dynamo\backends\common.py", line 72, in call cg = aot_module_simplified(gm, example_inputs, self.kwargs) File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 1071, in aot_module_simplified compiled_fn = dispatch_and_compile() File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 1056, in dispatch_and_compile compiledfn, = create_aot_dispatcher_function( File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 522, in create_aot_dispatcher_function return _create_aot_dispatcher_function( File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 759, in _create_aot_dispatcher_function compiled_fn, fw_metadata = compiler_fn( File "D:\python\lib\site-packages\torch_functorch_aot_autograd\jit_compile_runtime_wrappers.py", line 179, in aot_dispatch_base compiled_fw = compiler(fw_module, updated_flat_args) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 1350, in fw_compiler_base return _fw_compiler_base(model, example_inputs, is_inference) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 1421, in _fw_compiler_base return inner_compile( File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 475, in compile_fx_inner return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( File "D:\python\lib\site-packages\torch_dynamo\repro\after_aot.py", line 85, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 661, in _compile_fx_inner compiled_graph = FxGraphCache.load( File "D:\python\lib\site-packages\torch_inductor\codecache.py", line 1334, in load compiled_graph = compile_fx_fn( File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 570, in codegen_and_compile compiled_graph = fx_codegen_and_compile(gm, example_inputs, fx_kwargs) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 878, in fx_codegen_and_compile compiled_fn = graph.compile_to_fn() File "D:\python\lib\site-packages\torch_inductor\graph.py", line 1913, in compile_to_fn return self.compile_to_module().call File "D:\python\lib\site-packages\torch_inductor\graph.py", line 1839, in compile_to_module return self._compile_to_module() File "D:\python\lib\site-packages\torch_inductor\graph.py", line 1867, in _compile_to_module mod = PyCodeCache.load_by_key_path( File "D:\python\lib\site-packages\torch_inductor\codecache.py", line 2876, in load_by_key_path mod = _reload_python_module(key, path) File "D:\python\lib\site-packages\torch_inductor\runtime\compile_tasks.py", line 45, in _reload_python_module exec(code, mod.dict, mod.dict__) File "C:\Users\Liux\AppData\Local\Temp\torchinductor_Liux\sg\csg67ndzybntokio3b55i3vg3l5a35y3knylpx3pkta3idrnppgf.py", line 39, in triton_poi_fused__to_copy_0 = asynccompile.triton('triton', ''' File "D:\python\lib\site-packages\torch_inductor\async_compile.py", line 203, in triton kernel.precompile() File "D:\python\lib\site-packages\torch_inductor\runtime\triton_heuristics.py", line 244, in precompile compiled_binary, launcher = self._precompile_config( File "D:\python\lib\site-packages\torch_inductor\runtime\triton_heuristics.py", line 443, in _precompile_config binary = triton.compile(*compile_args, *compile_kwargs) File "D:\python\lib\site-packages\triton\compiler\compiler.py", line 280, in compile module = src.make_ir(options, codegen_fns, context) File "D:\python\lib\site-packages\triton\compiler\compiler.py", line 113, in make_ir return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1297, in ast_to_ttir generator.visit(fn.parse()) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit ret = super().visit(node) File "D:\python\lib\ast.py", line 418, in visit return visitor(node) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 359, in visit_Module ast.NodeVisitor.generic_visit(self, node) File "D:\python\lib\ast.py", line 426, in generic_visit self.visit(item) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit ret = super().visit(node) File "D:\python\lib\ast.py", line 418, in visit return visitor(node) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 443, in visit_FunctionDef self.visit_compound_statement(node.body) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 351, in visit_compound_statement self.visit(stmt) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit ret = super().visit(node) File "D:\python\lib\ast.py", line 418, in visit return visitor(node) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 496, in visit_Assign values = self.visit(node.value) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit ret = super().visit(node) File "D:\python\lib\ast.py", line 418, in visit return visitor(node) File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1124, in visit_Call raise CompilationError(self.jitfn.src, node, None) from e triton.compiler.errors.CompilationError: at 8:11: def triton(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 196608 xoffset = tl.program_id(0) XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = tl.full([XBLOCK], True, tl.int1) x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), None) tmp1 = tmp0.to(tl.float32) ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\ComfyUI\execution.py", line 327, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ComfyUI\execution.py", line 202, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ComfyUI\execution.py", line 174, in _map_node_over_list process_inputs(input_dict, i) File "D:\ComfyUI\execution.py", line 163, in process_inputs results.append(getattr(obj, func)(inputs)) File "D:\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 633, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) File "D:\ComfyUI\comfy\samplers.py", line 907, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(*args, *kwargs) File "D:\ComfyUI\comfy\samplers.py", line 876, in outer_sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "D:\ComfyUI\comfy\samplers.py", line 860, in inner_sample samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(args, kwargs) File "D:\ComfyUI\comfy\samplers.py", line 715, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) File "D:\python\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "D:\ComfyUI\comfy\k_diffusion\sampling.py", line 161, in sample_euler denoised = model(x, sigma_hat s_in, extra_args) File "D:\ComfyUI\comfy\samplers.py", line 380, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) File "D:\ComfyUI\comfy\samplers.py", line 840, in call return self.predict_noise(*args, kwargs) File "D:\ComfyUI\comfy\samplers.py", line 843, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) File "D:\ComfyUI\comfy\samplers.py", line 360, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) File "D:\ComfyUI\comfy\samplers.py", line 196, in calc_cond_batch return executor.execute(model, conds, x_in, timestep, model_options) File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(args, kwargs) File "D:\ComfyUI\comfy\samplers.py", line 307, in _calc_cond_batch output = model_options['model_function_wrapper'](model.apply_model, {"input": inputx, "timestep": timestep, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks) File "D:\ComfyUI\custom_nodes\Comfy-WaveSpeed\fbcache_nodes.py", line 106, in model_unet_function_wrapper return model_function(input, timestep, c) File "D:\ComfyUI\comfy\model_base.py", line 130, in apply_model return comfy.patcher_extension.WrapperExecutor.new_class_executor( File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(args, kwargs) File "D:\ComfyUI\comfy\model_base.py", line 159, in _apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "D:\python\lib\site-packages\torch_dynamo\eval_frame.py", line 465, in _fn return fn(*args, kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "D:\ComfyUI\comfy\ldm\flux\model.py", line 204, in forward out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None)) File "D:\ComfyUI\comfy\ldm\flux\model.py", line 109, in forward_orig img = self.img_in(img) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, *kwargs) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 1269, in call return self._torchdynamo_orig_callable( File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 1064, in call result = self._inner_convert( File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 526, in call return _compile( File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 924, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 666, in compile_inner return _compile_inner(code, one_graph, hooks, transform) File "D:\python\lib\site-packages\torch_utils_internal.py", line 87, in wrapper_function return function(args, kwargs) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 699, in _compile_inner out_code = transform_code_object(code, transform) File "D:\python\lib\site-packages\torch_dynamo\bytecode_transformation.py", line 1322, in transform_code_object transformations(instructions, code_options) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 219, in _fn return fn(*args, *kwargs) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 634, in transform tracer.run() File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2796, in run super().run() File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 983, in run while self.step(): File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 895, in step self.dispatch_table[inst.opcode](self, inst) File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2987, in RETURN_VALUE self._return(inst) File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2972, in _return self.output.compile_subgraph( File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1117, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1369, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1416, in call_user_compiler return self._call_user_compiler(gm) File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1465, in _call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e) from e torch.dynamo.exc.BackendCompilerFailed: backend='inductor' raised: CompilationError: at 8:11: def triton(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 196608 xoffset = tl.program_id(0) XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = tl.full([XBLOCK], True, tl.int1) x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), None) tmp1 = tmp0.to(tl.float32) ^

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True

Additional Context

(Please add any additional context or steps to reproduce the error here)

chengzeyi commented 2 weeks ago

this should be a windows related problem. What version of torch and triton are you using? Could you try upgrading them? Or could you try running without compilation?

WhiteCrowLX commented 2 weeks ago

this should be a windows related problem. What version of torch and triton are you using? Could you try upgrading them? Or could you try running without compilation?

triton-3.1.0
torch-2.5.1+cu124 I tried to update the cuda version, tried from 12.4 to 12.6, all of them didn't work.

Blake110 commented 2 weeks ago

same issus here with 3090 Linux ubuntu 22.04 triton-3.1.0 pytorch 12.6 working with disable compile+ node. Thanks for your works. it's super awesome.

Blake110 commented 2 weeks ago

changed to flux-fp16 model and t5xxl_fp16, the compile+ node works great without problems.

terrabys commented 1 week ago

@Blake110 Thanks that worked for me too.

enternalsaga commented 1 week ago

is there a way to make it work with lighter model like fp8 or gguf? fp16 is so excessively heavy to use

KLL535 commented 1 week ago

changed to flux-fp16 model and t5xxl_fp16, the compile+ node works great without problems.

This node does not allow you to select FP16 image

WhiteCrowLX commented 1 week ago

Thank you all for your help in clarifying my doubts. I've already found the reason. It's because of the 3090 (graphics card) that I can't use the quantization model with the compilation mode. Even if I change it to FP16 or BF16, the speed is far from satisfactory. So I have to turn off the compilation.