Open WhiteCrowLX opened 2 weeks ago
this should be a windows related problem. What version of torch and triton are you using? Could you try upgrading them? Or could you try running without compilation?
this should be a windows related problem. What version of torch and triton are you using? Could you try upgrading them? Or could you try running without compilation?
triton-3.1.0
torch-2.5.1+cu124
I tried to update the cuda version, tried from 12.4 to 12.6, all of them didn't work.
same issus here with 3090 Linux ubuntu 22.04 triton-3.1.0 pytorch 12.6 working with disable compile+ node. Thanks for your works. it's super awesome.
changed to flux-fp16 model and t5xxl_fp16, the compile+ node works great without problems.
@Blake110 Thanks that worked for me too.
is there a way to make it work with lighter model like fp8 or gguf? fp16 is so excessively heavy to use
changed to flux-fp16 model and t5xxl_fp16, the compile+ node works great without problems.
This node does not allow you to select FP16
Thank you all for your help in clarifying my doubts. I've already found the reason. It's because of the 3090 (graphics card) that I can't use the quantization model with the compilation mode. Even if I change it to FP16 or BF16, the speed is far from satisfactory. So I have to turn off the compilation.
ComfyUI Error Report
Error Details
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
2025-01-10T08:11:33.492030 - Traceback (most recent call last): File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1116, in visit_Call return fn(args, extra_kwargs, kws) File "D:\python\lib\site-packages\triton\language\core.py", line 35, in wrapper return fn(args, **kwargs) File "D:\python\lib\site-packages\triton\language\core.py", line 993, in to return semantic.cast(self, dtype, _builder, fp_downcast_rounding) File "D:\python\lib\site-packages\triton\language\semantic.py", line 759, in cast assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89" AssertionError: fp8e4nv data type is not supported on CUDA arch < 89
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1446, in _call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "D:\python\lib\site-packages\torch_dynamo\repro\after_dynamo.py", line 129, in call compiled_gm = compiler_fn(gm, example_inputs) File "D:\python\lib\site-packages\torch__init.py", line 2234, in call return compilefx(model, inputs_, config_patches=self.config) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 1521, in compile_fx return aot_autograd( File "D:\python\lib\site-packages\torch_dynamo\backends\common.py", line 72, in call cg = aot_module_simplified(gm, example_inputs, self.kwargs) File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 1071, in aot_module_simplified compiled_fn = dispatch_and_compile() File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 1056, in dispatch_and_compile compiledfn, = create_aot_dispatcher_function( File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 522, in create_aot_dispatcher_function return _create_aot_dispatcher_function( File "D:\python\lib\site-packages\torch_functorch\aot_autograd.py", line 759, in _create_aot_dispatcher_function compiled_fn, fw_metadata = compiler_fn( File "D:\python\lib\site-packages\torch_functorch_aot_autograd\jit_compile_runtime_wrappers.py", line 179, in aot_dispatch_base compiled_fw = compiler(fw_module, updated_flat_args) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 1350, in fw_compiler_base return _fw_compiler_base(model, example_inputs, is_inference) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 1421, in _fw_compiler_base return inner_compile( File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 475, in compile_fx_inner return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( File "D:\python\lib\site-packages\torch_dynamo\repro\after_aot.py", line 85, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 661, in _compile_fx_inner compiled_graph = FxGraphCache.load( File "D:\python\lib\site-packages\torch_inductor\codecache.py", line 1334, in load compiled_graph = compile_fx_fn( File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 570, in codegen_and_compile compiled_graph = fx_codegen_and_compile(gm, example_inputs, fx_kwargs) File "D:\python\lib\site-packages\torch_inductor\compile_fx.py", line 878, in fx_codegen_and_compile compiled_fn = graph.compile_to_fn() File "D:\python\lib\site-packages\torch_inductor\graph.py", line 1913, in compile_to_fn return self.compile_to_module().call File "D:\python\lib\site-packages\torch_inductor\graph.py", line 1839, in compile_to_module return self._compile_to_module() File "D:\python\lib\site-packages\torch_inductor\graph.py", line 1867, in _compile_to_module mod = PyCodeCache.load_by_key_path( File "D:\python\lib\site-packages\torch_inductor\codecache.py", line 2876, in load_by_key_path mod = _reload_python_module(key, path) File "D:\python\lib\site-packages\torch_inductor\runtime\compile_tasks.py", line 45, in _reload_python_module exec(code, mod.dict, mod.dict__) File "C:\Users\Liux\AppData\Local\Temp\torchinductor_Liux\sg\csg67ndzybntokio3b55i3vg3l5a35y3knylpx3pkta3idrnppgf.py", line 39, in
triton_poi_fused__to_copy_0 = asynccompile.triton('triton', '''
File "D:\python\lib\site-packages\torch_inductor\async_compile.py", line 203, in triton
kernel.precompile()
File "D:\python\lib\site-packages\torch_inductor\runtime\triton_heuristics.py", line 244, in precompile
compiled_binary, launcher = self._precompile_config(
File "D:\python\lib\site-packages\torch_inductor\runtime\triton_heuristics.py", line 443, in _precompile_config
binary = triton.compile(*compile_args, *compile_kwargs)
File "D:\python\lib\site-packages\triton\compiler\compiler.py", line 280, in compile
module = src.make_ir(options, codegen_fns, context)
File "D:\python\lib\site-packages\triton\compiler\compiler.py", line 113, in make_ir
return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1297, in ast_to_ttir
generator.visit(fn.parse())
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit
ret = super().visit(node)
File "D:\python\lib\ast.py", line 418, in visit
return visitor(node)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 359, in visit_Module
ast.NodeVisitor.generic_visit(self, node)
File "D:\python\lib\ast.py", line 426, in generic_visit
self.visit(item)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit
ret = super().visit(node)
File "D:\python\lib\ast.py", line 418, in visit
return visitor(node)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 443, in visit_FunctionDef
self.visit_compound_statement(node.body)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 351, in visit_compound_statement
self.visit(stmt)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit
ret = super().visit(node)
File "D:\python\lib\ast.py", line 418, in visit
return visitor(node)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 496, in visit_Assign
values = self.visit(node.value)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1204, in visit
ret = super().visit(node)
File "D:\python\lib\ast.py", line 418, in visit
return visitor(node)
File "D:\python\lib\site-packages\triton\compiler\code_generator.py", line 1124, in visit_Call
raise CompilationError(self.jitfn.src, node, None) from e
triton.compiler.errors.CompilationError: at 8:11:
def triton(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
xnumel = 196608
xoffset = tl.program_id(0) XBLOCK
xindex = xoffset + tl.arange(0, XBLOCK)[:]
xmask = tl.full([XBLOCK], True, tl.int1)
x0 = xindex
tmp0 = tl.load(in_ptr0 + (x0), None)
tmp1 = tmp0.to(tl.float32)
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "D:\ComfyUI\execution.py", line 327, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ComfyUI\execution.py", line 202, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "D:\ComfyUI\execution.py", line 174, in _map_node_over_list process_inputs(input_dict, i) File "D:\ComfyUI\execution.py", line 163, in process_inputs results.append(getattr(obj, func)(inputs)) File "D:\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 633, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) File "D:\ComfyUI\comfy\samplers.py", line 907, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(*args, *kwargs) File "D:\ComfyUI\comfy\samplers.py", line 876, in outer_sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "D:\ComfyUI\comfy\samplers.py", line 860, in inner_sample samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(args, kwargs) File "D:\ComfyUI\comfy\samplers.py", line 715, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) File "D:\python\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "D:\ComfyUI\comfy\k_diffusion\sampling.py", line 161, in sample_euler denoised = model(x, sigma_hat s_in, extra_args) File "D:\ComfyUI\comfy\samplers.py", line 380, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) File "D:\ComfyUI\comfy\samplers.py", line 840, in call return self.predict_noise(*args, kwargs) File "D:\ComfyUI\comfy\samplers.py", line 843, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) File "D:\ComfyUI\comfy\samplers.py", line 360, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) File "D:\ComfyUI\comfy\samplers.py", line 196, in calc_cond_batch return executor.execute(model, conds, x_in, timestep, model_options) File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(args, kwargs) File "D:\ComfyUI\comfy\samplers.py", line 307, in _calc_cond_batch output = model_options['model_function_wrapper'](model.apply_model, {"input": inputx, "timestep": timestep, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks) File "D:\ComfyUI\custom_nodes\Comfy-WaveSpeed\fbcache_nodes.py", line 106, in model_unet_function_wrapper return model_function(input, timestep, c) File "D:\ComfyUI\comfy\model_base.py", line 130, in apply_model return comfy.patcher_extension.WrapperExecutor.new_class_executor( File "D:\ComfyUI\comfy\patcher_extension.py", line 110, in execute return self.original(args, kwargs) File "D:\ComfyUI\comfy\model_base.py", line 159, in _apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "D:\python\lib\site-packages\torch_dynamo\eval_frame.py", line 465, in _fn return fn(*args, kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "D:\ComfyUI\comfy\ldm\flux\model.py", line 204, in forward out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None)) File "D:\ComfyUI\comfy\ldm\flux\model.py", line 109, in forward_orig img = self.img_in(img) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, *kwargs) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 1269, in call return self._torchdynamo_orig_callable( File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 1064, in call result = self._inner_convert( File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 526, in call return _compile( File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 924, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 666, in compile_inner return _compile_inner(code, one_graph, hooks, transform) File "D:\python\lib\site-packages\torch_utils_internal.py", line 87, in wrapper_function return function(args, kwargs) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 699, in _compile_inner out_code = transform_code_object(code, transform) File "D:\python\lib\site-packages\torch_dynamo\bytecode_transformation.py", line 1322, in transform_code_object transformations(instructions, code_options) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 219, in _fn return fn(*args, *kwargs) File "D:\python\lib\site-packages\torch_dynamo\convert_frame.py", line 634, in transform tracer.run() File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2796, in run super().run() File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 983, in run while self.step(): File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 895, in step self.dispatch_table[inst.opcode](self, inst) File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2987, in RETURN_VALUE self._return(inst) File "D:\python\lib\site-packages\torch_dynamo\symbolic_convert.py", line 2972, in _return self.output.compile_subgraph( File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1117, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1369, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1416, in call_user_compiler return self._call_user_compiler(gm) File "D:\python\lib\site-packages\torch_dynamo\output_graph.py", line 1465, in _call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e) from e torch.dynamo.exc.BackendCompilerFailed: backend='inductor' raised: CompilationError: at 8:11: def triton(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 196608 xoffset = tl.program_id(0) XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = tl.full([XBLOCK], True, tl.int1) x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), None) tmp1 = tmp0.to(tl.float32) ^
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
Additional Context
(Please add any additional context or steps to reproduce the error here)