tips for running the model in FP16 on 24GB GPU

I tried to run this model in Gradio GUI on Windows 10 but I had a few issues.

It was loading weights to CPU
Weights were loaded seemingly in FP32, so it was overflowing my VRAM and was therefore super slow.

I modified Inference.py (the one in deepseek_vl\serve) a bit to fix those issues. I also made sure that my torch was installed with cuda 11.8 and not cpu-only mode.

So, if someone else runs into problems when running this model on 24GB of VRAM, this issue might help you.

Installation instructions (assuming you're already in a virtual env, which you should be using)

git clone https://github.com/deepseek-ai/DeepSeek-VL
cd DeepSeek-VL
pip install torch==2.0.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -e .[gradio]
##replace inference.py file in the repo in folder  deepseek_vl/serve with the one provided by me
##if you have the model downloaded locally, maybe change the path in app_deepseek.py to a local one
python deepseek_vl/serve/app_deepseek.py

Here's the Inference.py that works for me https://gist.github.com/adamo1139/511f63c01c6088d7747f47628ffc970c

I will be closing it down, just want to leave a trace that will hopefully save some time for others.

deepseek-ai / DeepSeek-VL

tips for running the model in FP16 on 24GB GPU #42