Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
https://dit.hunyuan.tencent.com/
Other
2.59k stars 180 forks source link

Supporting dialoggen robustly in ComfyUI with a few small changes to your config.json and preprocessor_config.json in Hugging Face #90

Open doctorpangloss opened 2 weeks ago

doctorpangloss commented 2 weeks ago

Describe the feature

By adjusting the configuration of your Hugging Face metadata (config.json, ...) for DialogGen, it can be easily supported inside ComfyUI.

llava_example_01

Here is LLAVA 1.6 support demonstrated robustly in https://github.com/hiddenswitch/ComfyUI.

Author your repo to look like this: https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/tree/main

Observe the declaration of the "image processor" in https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/blob/main/preprocessor_config.json .

Then AutoProcessor.from_... will Just Work and images and text can be processed idiomatically.

You should also clarify your chat template by authoring one too. For example, this is llava-v1.5's, which is maybe what DialogGen uses but I am confused about that since it is based on llava-v1.6: https://github.com/AppMana/appmana-comfyui-chat-templates/commit/1ea0538059d541140f258725dd8d98449a7c043d In any case, you will add it to the Hugging Face configurations too.

Motivation

DialogGen is an essential part of the HunyuanDiT model in my experience. By authoring these changes, you will both improve the experience of the Hugging Face ecosystem and the ComfyUI one. This will also significantly reduce the peak VRAM usage because ComfyUI (and others) correctly load and unload models serially.

Additional context I maintain a fork of ComfyUI that has significant product development that may interest you. I have a lot of experience in developing the tooling, and I evaluate models often. HunyuanDiT is exceptionally good so I hope its adoption will rise.

xuhuaren commented 2 weeks ago

Could you please provide a PR with the corresponding code, so that we can review it and potentially merge it into the main repository? my email address: xuhuaren@tencent.com