InstructPix2Pix/CosXL_Edit support

spacepxl commented 5 months ago

[AnimateDiffEvo] - INFO - Loading motion module mm_sd_v15_v2.safetensors via Gen2
!!! Exception during processing !!!
Traceback (most recent call last):
  File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\nodes_gen1.py", line 52, in load_mm_and_inject_params
    validate_model_compatibility_gen2(model=model, motion_model=motion_model)
  File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\model_injection.py", line 499, in validate_model_compatibility_gen2
    raise MotionCompatibilityError(f"Motion module '{mm_info.mm_name}' is intended for {mm_info.sd_type} models, " \
ComfyUI-AnimateDiff-Evolved.animatediff.utils_motion.MotionCompatibilityError: Motion module 'mm_sd_v15_v2.safetensors' is intended for SD1.5 models, but the provided model is type SD15_instructpix2pix.

I bypassed the model compatibility check, and confirmed animatediff does work with sd1.5 ip2p and sdxl edit/ip2p models, with animatediff and hotshot motion modules. Not sure how you would want to change the compatibility check, maybe by partial string matching or a hardcoded compatibility list. It makes sense that it would work without any other modifications, since the only difference between a vanilla model and an ip2p model is the number channels on the input layer (8 instead of 4).

example test workflow

Kosinkadink commented 5 months ago

Thanks for the heads up. Could you list every class of model that you were able to see AnimateDiff working with? (separated by SD1.5/SDXL)? It would be a great help.

spacepxl commented 5 months ago

class SD15_instructpix2pix(IP2P, BaseModel) (works with sd1.5 motion modules)

class SDXL_instructpix2pix(IP2P, SDXL) (works with hotshot and xl motion modules)

https://github.com/comfyanonymous/ComfyUI/blob/133dc3351b3277f6ce41da7839ace9055329c64c/comfy/model_base.py#L498

Kosinkadink commented 5 months ago

I think the pix2pix models should be properly whitelisted as SD15/SDXL models now, would you be able to check on your end with the most recent version of AnimateDiff-Evolved to verify?

spacepxl commented 4 months ago

I tested and can confirm that it is working with the normal KSampler now. However, with SamplerCustomAdvanced (useful for controlling the dual CFG for ip2p), it fails, usually with the following error:

``` Error occurred when executing SamplerCustomAdvanced: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_group_norm) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 529, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 644, in sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 623, in inner_sample samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 534, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\sampling.py", line 137, in sample_euler denoised = model(x, sigma_hat * s_in, **extra_args) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 272, in __call__ out = self.inner_model(x, sigma, model_options=model_options, seed=seed) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 610, in __call__ return self.predict_noise(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 444, in predict_noise out = comfy.samplers.calc_cond_batch(self.inner_model, [negative_cond, middle_cond, self.conds.get("positive", None)], x, timestep, model_options) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 218, in calc_cond_batch output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\model_base.py", line 97, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float() File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 850, in forward h = forward_timestep_embed(module, h, emb, context, transformer_options, time_context=time_context, num_video_frames=num_video_frames, image_only_indicator=image_only_indicator) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 50, in forward_timestep_embed x = layer(x) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\motion_module_ad.py", line 673, in forward return self.temporal_transformer(input_tensor, encoder_hidden_states, attention_mask, self.view_options, mm_kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\motion_module_ad.py", line 882, in forward hidden_states = self.norm(hidden_states).to(hidden_states.dtype) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 94, in forward return super().forward(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\normalization.py", line 287, in forward return F.group_norm( File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\functional.py", line 2561, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) ```

Except for one time when I got this error instead:

``` Error occurred when executing SamplerCustomAdvanced: Error while processing rearrange-reduction pattern "(b f) d c -> (b d) f c". Input tensor shape: torch.Size([54, 4096, 320]). Additional info: {'f': 16}. Shape mismatch, can't divide axis of length 54 in chunks of 16 File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 529, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 644, in sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 623, in inner_sample samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 534, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\k_diffusion\sampling.py", line 137, in sample_euler denoised = model(x, sigma_hat * s_in, **extra_args) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 272, in __call__ out = self.inner_model(x, sigma, model_options=model_options, seed=seed) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 610, in __call__ return self.predict_noise(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 444, in predict_noise out = comfy.samplers.calc_cond_batch(self.inner_model, [negative_cond, middle_cond, self.conds.get("positive", None)], x, timestep, model_options) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 218, in calc_cond_batch output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\model_base.py", line 97, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float() File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 850, in forward h = forward_timestep_embed(module, h, emb, context, transformer_options, time_context=time_context, num_video_frames=num_video_frames, image_only_indicator=image_only_indicator) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 50, in forward_timestep_embed x = layer(x) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\motion_module_ad.py", line 673, in forward return self.temporal_transformer(input_tensor, encoder_hidden_states, attention_mask, self.view_options, mm_kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\motion_module_ad.py", line 891, in forward hidden_states = block( File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\motion_module_ad.py", line 1002, in forward attention_block( File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\motion_module_ad.py", line 1180, in forward hidden_states = rearrange( File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\einops\einops.py", line 591, in rearrange return reduce(tensor, pattern, reduction="rearrange", **axes_lengths) File "C:\Users\*****\Desktop\ComfyUI_windows_portable\python_embeded\lib\site-packages\einops\einops.py", line 533, in reduce raise EinopsError(message + "\n {}".format(e)) ```

Here's a test workflow: ip2p_AD_SamplerCustomAdvanced.json

However, I think these are general issues with any model between AnimateDiff and SamplerCustomAdvanced, not related specifically to ip2p. I tested SamplerCustomAdvanced with AnimateDiff on a normal 1.5 checkpoint and got the same device mismatch error. The basic SamplerCustom doesn't have this issue, it works just fine.

Kosinkadink / ComfyUI-AnimateDiff-Evolved

InstructPix2Pix/CosXL_Edit support #349