DialogGen is an essential part of the HunyuanDiT model in my experience. By authoring these changes, you will both improve the experience of the Hugging Face ecosystem and the ComfyUI one. This will also significantly reduce the peak VRAM usage because ComfyUI (and others) correctly load and unload models serially.
Additional context
I maintain a fork of ComfyUI that has significant product development that may interest you. I have a lot of experience in developing the tooling, and I evaluate models often. HunyuanDiT is exceptionally good so I hope its adoption will rise.
Could you please provide a PR with the corresponding code, so that we can review it and potentially merge it into the main repository?
my email address: xuhuaren@tencent.com
Describe the feature
By adjusting the configuration of your Hugging Face metadata (
config.json
, ...) for DialogGen, it can be easily supported inside ComfyUI.Here is LLAVA 1.6 support demonstrated robustly in https://github.com/hiddenswitch/ComfyUI.
Author your repo to look like this: https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/tree/main
Observe the declaration of the "image processor" in https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/blob/main/preprocessor_config.json .
Then
AutoProcessor.from_...
will Just Work and images and text can be processed idiomatically.You should also clarify your chat template by authoring one too. For example, this is
llava-v1.5
's, which is maybe what DialogGen uses but I am confused about that since it is based onllava-v1.6
: https://github.com/AppMana/appmana-comfyui-chat-templates/commit/1ea0538059d541140f258725dd8d98449a7c043d In any case, you will add it to the Hugging Face configurations too.Motivation
DialogGen is an essential part of the HunyuanDiT model in my experience. By authoring these changes, you will both improve the experience of the Hugging Face ecosystem and the ComfyUI one. This will also significantly reduce the peak VRAM usage because ComfyUI (and others) correctly load and unload models serially.
Additional context I maintain a fork of ComfyUI that has significant product development that may interest you. I have a lot of experience in developing the tooling, and I evaluate models often. HunyuanDiT is exceptionally good so I hope its adoption will rise.