This repository contains custom nodes for ComfyUI, specifically designed to enhance and infer prompts using the GLM-4 model on local hardware.
The nodes leverage the GLM-4 model to generate detailed and descriptive image/video captions or enhance user-provided prompts, among regular inference.
Prompts and inference can be combined with image if THUDM/glm-4v-9b
model is used.
All models will be downloaded automatically through HuggingFace.co. THUDM/glm-4v-9b
will hold ~26 GB of hdd space and THUDM/glm-4-9b
will hold ~18 GB of hdd space.
The nodes containes an "unload_model" option which frees up VRAM space and makes it suitable for workflows that requires larger VRAM space, like FLUX.1-dev and CogVideoX-5b(-I2V).
The prompt enhancer is based on this example from THUDM convert_demo.py. Thier demo is only for usage through OpenAI API and I wanted to build something local.
Hope you will enjoy your enhanced prompts and inference capabilities of these models. They are great!
Added support for quantized models. They are performing exceptionally well. Check metrics below.
Model alexwww94/glm-4v-9b-gptq-4bit
is significatly more lightweight than the original and will hold ~8.5 GB of hdd space.
Model alexwww94/glm-4v-9b-gptq-3bit
is even more lightweight and will hold ~7.6 GB of hdd space.
cd /your/path/to/ComfyUI/ComfyUI/custom_nodes/
git clone https://github.com/Nojahhh/ComfyUI_GLM4_Wrapper.git
cd ComfyUI_GLM4_Wrapper
../../python_embeded python -m pip install -r requirements.txt
The GLM4ModelLoader
class is responsible for loading GLM-4 models. It supports various models and precision settings.
fp16
, fp32
, bf16
).
THUDM/glm-4v-9b
requires bf16
and is set to run in 4-bit by default based on it's size.
alexwww94/glm-4v-9b-gptq-4bit
requires bf16
and is set to run in 4-bit by default.
alexwww94/glm-4v-9b-gptq-3bit
requires bf16
and is set to run in 3-bit by default.4
, 8
, 16
). Default value of 4
. (This option is bypassed when using the GPTQ-models).Enhances a given prompt using the GLM-4 model.
THUDM/glm-4v-9b
, alexwww94/glm-4v-9b-gptq-4bit
and alexwww94/glm-4v-9b-gptq-3bit
.Performs inference using the GLM-4 model.
THUDM/glm-4v-9b
, alexwww94/glm-4v-9b-gptq-4bit
and alexwww94/glm-4v-9b-gptq-3bit
.GLM4ModelLoader
GLM4PromptEnhancer
GLM4Inference
The following GLM-4 models are supported by this wrapper:
Model Name | Size | Recommended Precision |
---|---|---|
alexwww94/glm-4v-9b-gptq-4bit |
9B | bf16 (4-bit quant) |
alexwww94/glm-4v-9b-gptq-3bit |
9B | bf16 (3-bit quant) |
THUDM/glm-4v-9b |
9B | bf16 (4/8-bit quant) |
THUDM/glm-4-9b |
9B | fp16 , fp32 , bf16 |
THUDM/glm-4-9b-chat |
9B | fp16 , fp32 , bf16 |
THUDM/glm-4-9b-chat-1m |
9B | fp16 , fp32 , bf16 |
THUDM/LongCite-glm4-9b |
9B | fp16 , fp32 , bf16 |
THUDM/LongWriter-glm4-9b |
9B | fp16 , fp32 , bf16 |
THUDM/glm-4v-9b
requires bf16
precision and is default 4-bit quantization due to its size and the typical VRAM limitations of consumer-grade GPUs (often 24GB or less).alexwww94/glm-4v-9b-gptq-4bit
requires bf16
and is default 4-bit.alexwww94/glm-4v-9b-gptq-3bit
requires bf16
and is default 3-bit.THUDM/glm-4v-9b
, alexwww94/glm-4v-9b-gptq-4bit
and alexwww94/glm-4v-9b-gptq-3bit
models are able to handle image input.Below is an example of how to use the GLM-4 Prompt Enhancer and GLM-4 Inferencing nodes in your code:
from comfyui_glm4_wrapper import GLM4ModelLoader, GLM4PromptEnhancer, GLM4Inference
# Load the model
model_loader = GLM4ModelLoader()
pipeline = model_loader.gen(model="THUDM/glm-4v-9b", precision="bf16", quantization="8")[0]
# Enhance the prompt
enhancer = GLM4PromptEnhancer()
enhanced_prompt = enhancer.enhance_prompt(
GLMPipeline=pipeline,
prompt="A beautiful sunrise over the mountains",
max_tokens=200,
temperature=0.1,
top_k=40,
top_p=0.7,
repetition_penalty=1.1,
image=None, # PIL Image
unload_model=True
)
print(enhanced_prompt)
from comfyui_glm4_wrapper import GLM4ModelLoader, GLM4PromptEnhancer, GLM4Inference
# Load the model
model_loader = GLM4ModelLoader()
pipeline = model_loader.gen(model="THUDM/glm-4v-9b", precision="bf16", quantization="8")[0]
# Perform inference
inference = GLM4Inference()
output_text = inference.infer(
GLMPipeline=pipeline,
system_prompt="Describe the scene in detail:",
user_prompt="A bustling city street at night",
max_tokens=250,
temperature=0.7,
top_k=50,
top_p=1,
repetition_penalty=1.0,
image=None,
unload_model=True
)
print(output_text)
For more detailed examples and advanced usage, please refer to the documentation or the example scripts provided in the repository.
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or feedback, please open an issue on GitHub or contact me at mellin.johan@gmail.com.