RuntimeError: cutlassF: no kernel found to launch! - Githubissues

TencentARC / PhotoMaker

PhotoMaker [CVPR 2024]

https://photo-maker.github.io/

Other

9.58k stars 768 forks source link

RuntimeError: cutlassF: no kernel found to launch! #102

Open tsk42982 opened 10 months ago

tsk42982 commented 10 months ago

I failed to run photomaker_demo.ipynb, and the error message is as follows. What is the reason?

RuntimeError Traceback (most recent call last) Cell In[8], line 13 10 if start_merge_step > 30: 11 start_merge_step = 30 ---> 13 images = pipe( 14 prompt=prompt, 15 input_id_images=input_id_images, 16 negative_prompt=negative_prompt, 17 num_images_per_prompt=4, 18 num_inference_steps=num_steps, 19 start_merge_step=start_merge_step, 20 generator=generator, 21 ).images

File ~.conda\envs\photomaker\lib\site-packages\torch\utils_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File G:\github\PhotoMaker_main\photomaker\pipeline.py:442, in PhotoMakerStableDiffusionXLPipeline.call(self, prompt, prompt_2, height, width, num_inference_steps, denoising_end, guidance_scale, negative_prompt, negative_prompt_2, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, output_type, return_dict, cross_attention_kwargs, guidance_rescale, original_size, crops_coords_top_left, target_size, callback, callback_steps, input_id_images, start_merge_step, class_tokens_mask, prompt_embeds_text_only, pooled_prompt_embeds_text_only) 440 # predict the noise residual 441 added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids} --> 442 noise_pred = self.unet( 443 latent_model_input, 444 t, 445 encoder_hidden_states=current_prompt_embeds, 446 cross_attention_kwargs=cross_attention_kwargs, 447 added_cond_kwargs=added_cond_kwargs, 448 return_dict=False, 449 )[0] 451 # perform guidance 452 if do_classifier_free_guidance:

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~.conda\envs\photomaker\lib\site-packages\diffusers\models\unet_2d_condition.py:1112, in UNet2DConditionModel.forward(self, sample, timestep, encoder_hidden_states, class_labels, timestep_cond, attention_mask, cross_attention_kwargs, added_cond_kwargs, down_block_additional_residuals, mid_block_additional_residual, down_intrablock_additional_residuals, encoder_attention_mask, return_dict) 1109 if is_adapter and len(down_intrablock_additional_residuals) > 0: 1110 additional_residuals["additional_residuals"] = down_intrablock_additional_residuals.pop(0) -> 1112 sample, res_samples = downsample_block( 1113 hidden_states=sample, 1114 temb=emb, 1115 encoder_hidden_states=encoder_hidden_states, 1116 attention_mask=attention_mask, 1117 cross_attention_kwargs=cross_attention_kwargs, 1118 encoder_attention_mask=encoder_attention_mask, 1119 **additional_residuals, 1120 ) 1121 else: 1122 sample, res_samples = downsample_block(hidden_states=sample, temb=emb, scale=lora_scale)

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~.conda\envs\photomaker\lib\site-packages\diffusers\models\unet_2d_blocks.py:1160, in CrossAttnDownBlock2D.forward(self, hidden_states, temb, encoder_hidden_states, attention_mask, cross_attention_kwargs, encoder_attention_mask, additional_residuals) 1158 else: 1159 hidden_states = resnet(hidden_states, temb, scale=lora_scale) -> 1160 hidden_states = attn( 1161 hidden_states, 1162 encoder_hidden_states=encoder_hidden_states, 1163 cross_attention_kwargs=cross_attention_kwargs, 1164 attention_mask=attention_mask, 1165 encoder_attention_mask=encoder_attention_mask, 1166 return_dict=False, 1167 )[0] 1169 # apply additional residuals to the output of the last pair of resnet and attention blocks 1170 if i == len(blocks) - 1 and additional_residuals is not None:

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~.conda\envs\photomaker\lib\site-packages\diffusers\models\transformer_2d.py:392, in Transformer2DModel.forward(self, hidden_states, encoder_hidden_states, timestep, added_cond_kwargs, class_labels, cross_attention_kwargs, attention_mask, encoder_attention_mask, return_dict) 380 hidden_states = torch.utils.checkpoint.checkpoint( 381 create_custom_forward(block), 382 hidden_states, (...) 389 **ckpt_kwargs, 390 ) 391 else: --> 392 hidden_states = block( 393 hidden_states, 394 attention_mask=attention_mask, 395 encoder_hidden_states=encoder_hidden_states, 396 encoder_attention_mask=encoder_attention_mask, 397 timestep=timestep, 398 cross_attention_kwargs=cross_attention_kwargs, 399 class_labels=class_labels, 400 ) 402 # 3. Output 403 if self.is_input_continuous:

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~.conda\envs\photomaker\lib\site-packages\diffusers\models\attention.py:329, in BasicTransformerBlock.forward(self, hidden_states, attention_mask, encoder_hidden_states, encoder_attention_mask, timestep, cross_attention_kwargs, class_labels, added_cond_kwargs) 326 cross_attention_kwargs = cross_attention_kwargs.copy() if cross_attention_kwargs is not None else {} 327 gligen_kwargs = cross_attention_kwargs.pop("gligen", None) --> 329 attn_output = self.attn1( 330 norm_hidden_states, 331 encoder_hidden_states=encoder_hidden_states if self.only_cross_attention else None, 332 attention_mask=attention_mask, 333 *cross_attention_kwargs, 334 ) 335 if self.use_ada_layer_norm_zero: 336 attn_output = gate_msa.unsqueeze(1) attn_output

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1518, in Module._wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs)

File ~.conda\envs\photomaker\lib\site-packages\torch\nn\modules\module.py:1527, in Module._call_impl(self, *args, *kwargs) 1522 # If we don't have any hooks, we want to skip the rest of the logic in 1523 # this function, and just call forward. 1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1529 try: 1530 result = None

File ~.conda\envs\photomaker\lib\site-packages\diffusers\models\attention_processor.py:527, in Attention.forward(self, hidden_states, encoder_hidden_states, attention_mask, cross_attention_kwargs) 508 r""" 509 The forward method of the Attention class. 510 (...) 522 torch.Tensor: The output of the attention layer. 523 """ 524 # The Attention class can call different attention processors / attention functions 525 # here we simply pass along all tensors to the selected processor class 526 # For standard processors that are defined here, `cross_attention_kwargs` is empty --> 527 return self.processor( 528 self, 529 hidden_states, 530 encoder_hidden_states=encoder_hidden_states, 531 attention_mask=attention_mask, 532 **cross_attention_kwargs, 533 )

File ~.conda\envs\photomaker\lib\site-packages\diffusers\models\attention_processor.py:1259, in AttnProcessor2_0.call(self, attn, hidden_states, encoder_hidden_states, attention_mask, temb, scale) 1255 value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2) 1257 # the output of sdp = (batch, num_heads, seq_len, head_dim) 1258 # TODO: add support for attn.scale when we move to Torch 2.1 -> 1259 hidden_states = F.scaled_dot_product_attention( 1260 query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False 1261 ) 1263 hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim) 1264 hidden_states = hidden_states.to(query.dtype)

RuntimeError: cutlassF: no kernel found to launch!

ljzycmd commented 10 months ago

Hi @tsk42982, there are some problems with your running environment. You may check the dependencies following the instructions here https://github.com/TencentARC/PhotoMaker?tab=readme-ov-file#-dependencies-and-installation. If you are using a Windows device, you may refer to the installation https://github.com/TencentARC/PhotoMaker?tab=readme-ov-file#windows-version-of-photomaker.

Vargol commented 10 months ago

As you're getting RuntimeError: cutlassF: no kernel found to launch! when running F.scaled_dot_product_attention try changes the references of bfloat16 to float16. It seems that F.scaled_dot_product_attention uses something that doesn't support bfloat16 on many GPU's.

Paper99 commented 10 months ago

Good suggestion.