Implement Modal's Keep Warm feature for faster inference

Description

It would be great to use keep_warm when creating Modal containers using ComfyDeploy. This feature allows Modal to maintain a pool of pre-warmed instances, which can reduce cold start times and improve the responsiveness of the serverless GPU inference for ComfyUI.

Current Situation

Currently, ComfyDeploy scales from zero, which means there is a noticeable delay when spinning up new containers for inference requests.

Proposed Solution

Implement Modal's keep_warm feature in the Modal function decorators used in ComfyDeploy. This can be done by modifying the comfyui-deploy/builder/modal-builder/src/template/app.py file.

Example Implementation

import modal

# ... existing imports ...

@stub.function(
    image=target_image,
    gpu=config["gpu"],
    allow_concurrent_inputs=100,
    concurrency_limit=1,
    timeout=10 * 60,
    keep_warm=3  # Add this line to keep 3 containers warm (suggested default)
)
@asgi_app()
def comfyui_app():
    # ... existing function body ...

Benefits

Reduced cold start latency
Faster response times for inference requests
Improved user experience, especially for applications requiring quick responses

Considerations

The number of warm containers (e.g., keep_warm=3) should be configurable based on expected load and cost considerations.
The front-end components must be updated to reflect this.

BennyKok / comfyui-deploy