Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.
We need to register new works when traffic increases, but every time we spawn a new work, it takes a lot of time to provision, and most of the time is spent on installing the requirements even though it's the same in all the works. It should cache the requirements.
Once work is stopped, it can't be restarted, so we used a workaround to use UUID once the app is running for too long, it creates a mess in the UI.