[Bug]: Cannot change checkpoint on Colab

mykeehu commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I started the system on the Colab and when I want to switch to another model (e.g. deliberate), the memory overflows because it then drops back below 1 GB and I get a CTRL+C message in the console. Need to fix something about the memory usage and there is nothing in the settings to cache the RAM. Maybe the hash code calculation is causing the memory usage to jump and should be optimized for, I don't know.

Steps to reproduce the problem

Start SD on Colab
Change model to other (size equal or over 4 GB)

What should have happened?

Memory usage should remain below 10-12 GB RAM even when switching models

Commit where the problem happens

3715ece0adce7bf7c5e9c5ab3710b2fdc3848f39

What platforms do you use to access the UI ?

Other/Cloud

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

!COMMANDLINE_ARGS="--xformers --share --enable-insecure-extension-access --gradio-auth mykee:diffusion --ckpt-dir /content/drive/MyDrive/SDmodels --vae-dir /content/drive/MyDrive/VAE --lora-dir /content/drive/MyDrive/Lora --esrgan-models-path /content/drive/MyDrive/ESRGAN --ui-config-file /content/drive/MyDrive/1my/ui-config-my.json --ui-settings-file /content/drive/MyDrive/1my/config-my.json --styles-file /content/drive/MyDrive/1my/styles.csv --embeddings-dir /content/drive/MyDrive/TI-Embeddings --hypernetwork-dir /content/drive/MyDrive/Hypernetworks --gradio-img2img-tool color-sketch" REQS_FILE="requirements.txt" python launch.py

List of extensions

None

Console logs

Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0]
Commit hash: 3715ece0adce7bf7c5e9c5ab3710b2fdc3848f39
Installing requirements for Web UI
Launching Web UI with arguments: --xformers --share --enable-insecure-extension-access --gradio-auth mykee:diffusion --ckpt-dir /content/drive/MyDrive/SDmodels --vae-dir /content/drive/MyDrive/VAE --lora-dir /content/drive/MyDrive/Lora --esrgan-models-path /content/drive/MyDrive/ESRGAN --ui-config-file /content/drive/MyDrive/1my/ui-config-my.json --ui-settings-file /content/drive/MyDrive/1my/config-my.json --styles-file /content/drive/MyDrive/1my/styles.csv --embeddings-dir /content/drive/MyDrive/TI-Embeddings --hypernetwork-dir /content/drive/MyDrive/Hypernetworks --gradio-img2img-tool color-sketch
2023-02-13 13:15:52.034489: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-13 13:15:55.313332: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-02-13 13:15:55.313977: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/usr/lib64-nvidia
2023-02-13 13:15:55.314007: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Checkpoint !MyMixed/oof85a_11050-theAllySMixIIChurned-mix90-fixed.safetensors [329141fd49] not found; loading fallback model.ckpt [fe4efff1e1]
Loading weights [fe4efff1e1] from /content/drive/MyDrive/SDmodels/model.ckpt
Creating model from config: /content/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: /content/drive/MyDrive/VAE/SD-1.5-vae-ft-mse-840000-ema-pruned.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0): 
Model loaded in 36.9s (load weights from disk: 23.3s, create model: 0.9s, apply weights to model: 1.9s, apply half(): 1.2s, load VAE: 8.3s, move model to device: 0.7s, load textual inversion embeddings: 0.6s).
/usr/local/lib/python3.8/dist-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Slider, please remove them: {'min': 0, 'max': 500}
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://380a855f-07f4-4cf3.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
WARNING:multipart.multipart:Consuming a byte in the end state
WARNING:multipart.multipart:Consuming a byte in the end state
Loading weights [10a699c0f3] from /content/drive/MyDrive/SDmodels/deliberate_v11.ckpt
^C

Additional information

mykeehu commented 1 year ago

It's getting more and more interesting, because Gradio throws itself after generating a few images, so it's more and more likely that it's not just a Colab problem, but rather a Gradio connection. For some reason it seems to be over-buffering and dropping the connection to the Colab. Maybe there is something RAM limit on the Gradio site?

FTC55 commented 1 year ago

Merging models is also affected. If one of the models is too large the webui crashes silently and ram usage goes back to about 800mb instantly. Isn't --lowram supposed to prevent this scenario by loading to vram though? Because it doesn't seem to work.

mykeehu commented 1 year ago

Today I can't switch models on colab because it reaches the RAM cap. I wanted to switch from realisticVisionV13 to the basic SD 1.5 (pruned-emaonly) and I couldn't. Memory optimization would be needed when switching models.

I restarted colab, now I set the default model to start with the --ckpt command. That's all it uses by default:

A "DiffusionWrapper has 859.52 M params." the memory usage almost doubles, as if the model is loaded twice.

AUTOMATIC1111 / stable-diffusion-webui