Open maxjiang153 opened 10 months ago
I'm currently trying to implement a Matrix/Synapse bot to work with the Comfy API. I might be facing the same or similar issues (probably) and will, in the progress, probably try to tinker around with the API implementation. I do not know, neither can I promise, if I come up with an improvement PR for the API. Just wanted to let you know, you're not alone. I'd like to see the API improved too.
Or in short: +1. And I'm "kind of on it". If someone else can come up with such PRs faster than me, please go for it!!!
In https://github.com/rvion/CushyStudio, this is pretty much solved using various . I can have multiple prompts in parallel, on multiple hosts and track their relative completion, or dispatch missing images to various nodes. The only missing part IMO is adding a better way to handle image upload without trashing the upload folder, but ComfyUI already agreed to some proposal. I'll open an issue / draft PR soon to track its completion
With ComfyUI Nodes for External Tooling, you can load images encoded in Base64 and also send result images with WebSocket.
Our approach for running large scale inference job is using third-party queue (like Amazon SQS) instead of ComfyUI's built-in queue, and run multiple ComfyUI in parallel on Kubernetes. You can check our approach at https://github.com/aws-samples/stable-diffusion-on-eks. We use AWS managed services heavily for management and request routing.
When using the ComfyUI API to process multiple images with multiple ComfyUI servers (imagine processing 100k images with 100 ComfyUI instances).
It will face lots of challenges with the API.
The key obstacles that I'm facing are:
So based on the above obstacles, I suggest providing some enhancements regarding the ComfyUI API.
Add a new unique ComfyUI instance ID: When you call the API to submit a new prompt job. and query the history to check if the job was finished or not. If you can't find the job in history, if we can add an instance ID in the /history API then you can have an extra flag to check if the server is the same instance or not. to avoid when the server reboots or is replaced in multiple instances.
Add ComfyUI job queue size limitations: To avoid submitting lots of prompt jobs, let the job queue out of the memory. If we submit lots of jobs, ComfyUI can reject new prompt jobs to avoid this and make it configable.
Add instance ID and job queue status to the /system_stats API: So we can know how many jobs are waiting in the queue to avoid submitting new jobs to busy instance. And identify the instance.