Model switching crashing

JohnTigue commented 1 year ago

We are constantly crashing the dockerized Invoke server while switching models. For example:

webui-docker-invoke-1  | >> Usage stats:
webui-docker-invoke-1  | >>   1 image(s) generated in 5.34s
webui-docker-invoke-1  | >>   Max VRAM used for this generation: 3.32G. Current VRAM utilization: 2.17G
webui-docker-invoke-1  | >>   Max VRAM used since script start:  3.32G
webui-docker-invoke-1  | >> Model change requested: dream_shaper_3_32
webui-docker-invoke-1  | >> Current VRAM usage:  2.17G
webui-docker-invoke-1  | >> Cache limit (max=2) reached. Purging stable-diffusion-1.5
webui-docker-invoke-1  | >> Offloading protogen_x3_4 to CPU
webui-docker-invoke-1  | >> Loading dream_shaper_3_32 from /data/StableDiffusion/dream_shaper_3_32.ckpt
webui-docker-invoke-1  | >> Scanning Model: dream_shaper_3_32
webui-docker-invoke-1  | >> Model scanned ok!
webui-docker-invoke-1  | >> Loading dream_shaper_3_32 from /data/StableDiffusion/dream_shaper_3_32.ckpt
webui-docker-invoke-1  | ** model dream_shaper_3_32 could not be loaded: 
webui-docker-invoke-1  | Traceback (most recent call last):
webui-docker-invoke-1  |   File "/stable-diffusion/ldm/generate.py", line 861, in set_model
webui-docker-invoke-1  |     model_data = cache.get_model(model_name)
webui-docker-invoke-1  |   File "/stable-diffusion/ldm/invoke/model_manager.py", line 97, in get_model
webui-docker-invoke-1  |     requested_model, width, height, hash = self._load_model(model_name)
webui-docker-invoke-1  |   File "/stable-diffusion/ldm/invoke/model_manager.py", line 312, in _load_model
webui-docker-invoke-1  |     model, width, height, model_hash = self._load_ckpt_model(model_name, mconfig)
webui-docker-invoke-1  |   File "/stable-diffusion/ldm/invoke/model_manager.py", line 360, in _load_ckpt_model
webui-docker-invoke-1  |     weight_bytes = f.read()
webui-docker-invoke-1  | MemoryError
webui-docker-invoke-1  | 
webui-docker-invoke-1  | ** trying to reload previous model
webui-docker-invoke-1  | >> Retrieving model protogen_x3_4 from system RAM cache

JohnTigue commented 1 year ago

Seems error 137 is out of memory related:

webui-docker-invoke-1  | >> Scanning Model: protogen_x3_4
webui-docker-invoke-1  | >> Model scanned ok!
webui-docker-invoke-1  | >> Loading protogen_x3_4 from /data/StableDiffusion/protogen_x3_4.ckpt
webui-docker-invoke-1 exited with code 137

JohnTigue commented 1 year ago

webui-docker-invoke-1  | >> Loading protogen_x3_4 from /data/StableDiffusion/protogen_x3_4.ckpt
webui-docker-invoke-1 exited with code 137
[root@ip-172-31-6-103 stable-diffusion-webui-docker]#

JohnTigue commented 1 year ago

Wait a sec. If 137 is out of memory, maybe that's different than the caching bug. Maybe we just need to config Docker to take ALL the memory of the machine, because that's all the machine will be doing: one task per container instance.

JohnTigue commented 1 year ago

Thing already has 16GB of 24GB total:

CONTAINER ID   NAME                    CPU %     MEM USAGE / LIMIT     MEM %     NET I/O         BLOCK I/O     PIDS
3b7d2b78cdf0   webui-docker-invoke-1   0.18%     4.074GiB / 15.48GiB   26.31%    1.07MB / 24MB   11.1GB / 0B   11

JohnTigue commented 1 year ago

This might help with running out of memory and/or cache failures, Invoke 2.3.0 release notes:

Support for the XFormers Memory-Efficient Crossattention Package On CUDA (Nvidia) systems, version 2.3.0 supports the XFormers library. Once installed, thexformers package dramatically reduces the memory footprint of loaded Stable Diffusion models files and modestly increases image generation speed. xformers will be installed and activated automatically if you specify a CUDA system at install time.

The caveat with using xformers is that it introduces slightly non-deterministic behavior, and images generated using the same seed and other settings will be subtly different between invocations. Generally the changes are unnoticeable unless you rapidly shift back and forth between images, but to disable xformers and restore fully deterministic behavior, you may launch InvokeAI using the --no-xformers option. This is most conveniently done by opening the file invokeai/invokeai.init with a text editor, and adding the line --no-xformers at the bottom.

JohnTigue commented 1 year ago

For both stand-alone and in-cluster mode, might as well break the config script into two:

Instantiation config (one time run)
Start-up config (every time we reboot it: go back into SD service mode)

MountaintopLotus / braintrust

Model switching crashing #72