Request for ComfyUI support for Hunyuan DIT

xuhuaren commented 2 months ago

Dear ComfyUI team,

I hope this email finds you well. My name is richard, and I am one of a developer of Hunyuan DIT, an innovative and effective model that utilizes the DIT architecture. Our project has been gaining traction within the open-source community, and we believe that integrating our model with ComfyUI would greatly benefit users by providing them with enhanced support and a more seamless experience.

I am reaching out to inquire if there are any plans or opportunities for collaboration between our teams. We are more than willing to provide any necessary assistance, including sharing our source code, to ensure a smooth integration process. We believe that by working together, we can create a more robust and user-friendly platform for the open-source community.

If you are interested in discussing this further or require any additional information, please feel free to contact me at xuhuaren@tencent.com. I look forward to the possibility of collaborating with the ComfyUI team and eagerly await your response.

Thank you for your time and consideration.

Best regards,

Richard

GavChap commented 2 months ago

There are already nodes that support this: https://github.com/city96/ComfyUI_ExtraModels/tree/main

xuhuaren commented 2 months ago

There are already nodes that support this: https://github.com/city96/ComfyUI_ExtraModels/tree/main

we want to merge our code to comfyui main branch

comfyanonymous commented 1 month ago

For the diffusion model code itself there's a few things you can do to make it easier for me to implement it properly:

Have a minimal implementation of the model code that only depends on pytorch under a license compatible with the GPL license that ComfyUI uses.

Provide a reference image with sampling settings/seed/etc.. so that I can make sure the ComfyUI implementation matches the reference one.

Replace all attention functions with the comfyui "optimized_attention" attention function, an example of the sdpa implementation is here:

def optimized_attention(q, k, v, heads, mask=None, attn_precision=None, skip_reshape=False):
    if skip_reshape:
        b, _, _, dim_head = q.shape
    else:
        b, _, dim_head = q.shape
        dim_head //= heads
        q, k, v = map(
            lambda t: t.view(b, -1, heads, dim_head).transpose(1, 2),
            (q, k, v),
        )

    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
    out = (
        out.transpose(1, 2).reshape(b, -1, heads * dim_head)
    )
    return out

You can see some examples:

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/audio/dit.py#L369 https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/cascade/common.py#L47 https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/ldm/modules/diffusionmodules/mmdit.py#L293

mcmonkey4eva commented 1 month ago

Might I encourage you to release copies of your primary models in .safetensors file format (instead of the legacy pickle format). If you're unfamiliar here's the introduction page for safetensors format: https://huggingface.co/docs/safetensors/en/index - it's considered the modern standard for AI/ML models.

Additionally I encourage you to include a metadata header compatible with ModelSpec. If HunyuanDiT becomes popular and receives many finetunes in the way Stable Diffusion has, having an architecture ID in the header will be very helpful for software to automatically recognize the finetuned models as HunyuanDiT based models.

comfyanonymous / ComfyUI

Request for ComfyUI support for Hunyuan DIT #3751