bombax-xiaoice / ComfyUI-Allegro

ComfyUI supports over rhymes-ai/Allegro, which uses text prompt to generate short video in relatively high quality, especially comparing to other open source solutions available for now
Apache License 2.0
3 stars 0 forks source link

No Available Kernel error #1

Open Darkbra opened 1 week ago

Darkbra commented 1 week ago

I am on windows 11 with the latest Comfy installation. All models fully downloaded and the nodes load with no errors. I have two RTX A5000 with the CUDA 12.4 toolkit and the respective graphics drivers.

When I queue the prompt I get the following error:

!! Exception during processing !!! No available kernel. Aborting execution. Traceback (most recent call last): File "F:\Data\Packages\ComfyUI\execution.py", line 323, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\Data\Packages\ComfyUI\execution.py", line 198, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "F:\Data\Packages\ComfyUI\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "F:\Data\Packages\ComfyUI\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(inputs)) File "F:\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Allegro\nodes.py", line 198, in run output = pipe( File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "F:\Data\Packages\ComfyUI/custom_nodes/ComfyUI-Allegro\allegro\pipelines\pipeline_allegro.py", line 775, in call noise_pred = self.transformer( File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(args, kwargs) File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, kwargs) File "F:\Data\Packages\ComfyUI/custom_nodes/ComfyUI-Allegro\allegro\models\transformers\transformer_3d_allegro.py", line 335, in forward hidden_states = block( File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "F:\Data\Packages\ComfyUI/custom_nodes/ComfyUI-Allegro\allegro\models\transformers\block.py", line 1093, in forward attn_output = self.attn1( File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "F:\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(args, **kwargs) File "F:\Data\Packages\ComfyUI/custom_nodes/ComfyUI-Allegro\allegro\models\transformers\block.py", line 553, in forward return self.processor( File "F:\Data\Packages\ComfyUI/custom_nodes/ComfyUI-Allegro\allegro\models\transformers\block.py", line 824, in call hidden_states = F.scaled_dot_product_attention( RuntimeError: No available kernel. Aborting execution.

I do not have any other issues running any other text to video models and I use Cog, Pyramid flow and LTX issue free. I fear this is not my installation but the code itself.

bombax-xiaoice commented 5 days ago

I don't have a RTX A5000 so won't be able to repro your issue. But you might try one of the following to see if it can work 1) upgrade torch packages pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 2) restart comfyui with alternative attention implementation by one of the following parameters python main.py --use-split-cross-attention python main.py --use-quad-cross-attention python main.py --use-pytorch-cross-attention

DaveGravel commented 4 days ago

I am on windows 11 with the latest updated Comfy installation. All models fully downloaded and the nodes load with no errors. I have a RTX 4070ti super 16gb with the CUDA 12.4 toolkit and the respective graphics drivers. Same error here.

Darkbra commented 4 days ago

I am already on the latest everything as you suggest in your reply. Same error trying differenent cross attention

DaveGravel commented 3 days ago

I have found this information, but I haven't had time to test it again. Maybe you will find it useful. https://github.com/rhymes-ai/Allegro/issues/17

In block.py I have modified section begin at line 813 and it look to work for me.

        if self.use_rope:
            # require the shape of (batch_size x nheads x ntokens x dim)
            pos_thw = self.position_getter(batch_size, t=frame, h=height, w=width, device=query.device)
            query = self.rope(query, pos_thw)
            key = self.rope(key, pos_thw)

        # the output of sdp = (batch, num_heads, seq_len, head_dim)
        # TODO: add support for attn.scale when we move to Torch 2.1
        if self.attention_mode == 'flash':
        #    with sdpa_kernel(SDPBackend.FLASH_ATTENTION):
        #                torch.nn.functional.scaled_dot_product_attention(
        #                    query, key, value, dropout_p=0.0, is_causal=False
        #                )
            with torch.backends.cuda.sdp_kernel(
                    enable_math=True, enable_flash=True, enable_mem_efficient=True, enable_cudnn=True
                ):
                torch.nn.functional.scaled_dot_product_attention(
                    query,
                    key,
                    value,
                    attn_mask=attention_mask,
                )
        elif self.attention_mode == 'xformers':
            with sdpa_kernel(SDPBackend.EFFICIENT_ATTENTION):
                hidden_states = torch.nn.functional.scaled_dot_product_attention(
                    query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
                )
Darkbra commented 3 days ago

I was playing the block.py but you beat me to it. Trouble is now it is saying 3 hours on my A5000 for render time. Hopefully this is just incorrect estimation. The reference i workflow is rather simple. I have built a high res intepolated version and if I get decent output I will post in my repos.

bombax-xiaoice commented 2 days ago

Unfortunately, it DOES take THAT long unless you have an AXXX or HXXX card