awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Apache License 2.0
998 stars 230 forks source link

Serve multiple models Pytorch GPU memory problem #880

Closed netpcvnn closed 4 years ago

netpcvnn commented 4 years ago

Hi all,

I would like to serve multiple models but look like each model (service) consumes a lot of memory for example (Model size 70MB but it consumes about 1GB GPU memory). Do we have any ways to reduce the GPU memory?

vdantu commented 4 years ago

@netpcvnn There is an feature called preload_model. This enables model server to have a single copy of model on the host. You could try that and see how that works for you.

preload_model=true
netpcvnn commented 4 years ago

Thanks @vdantu I have tried use preload_model but I haven't seen the difference. So each model occupied about 1 GB GPU memory that is independent on model size. Actually, I dont understand much about preload_model feature and how it works.

vdantu commented 4 years ago

you tried with preload_model = true? what version of MMS are you running?

netpcvnn commented 4 years ago

Yes, I have tried the config.properties with preload_model=true. The version of MMS I used: 1.0.8 with pip install. My config file as follow and GPU memory usage with 3 models: Screenshot from 2019-11-28 09-00-58 Screenshot from 2019-11-28 08-58-49

vdantu commented 4 years ago

@netpcvnn : I don't think this feature was in 1.0.8. It must have gone in with the latest version of MMS . I think this was packaged as multi-model-server,

pip install multi-model-server @alexwong : Please feel free to correct me :)

netpcvnn commented 4 years ago

Thanks for your helps. I have checked the MMS version again. The version is 1.0.8.1.

vdantu commented 4 years ago

@netpcvnn : If its not working with 1.0.8.1, could you try once with the pre release?

pip install --pre mxnet-model-server

netpcvnn commented 4 years ago

Hi, I tried installing pre-release version. The version now is 1.0.9b20191115. But I haven't seen the difference. The GPU memory is still the same.

vdantu commented 4 years ago

ohh. I might have missed the bigger question. This feature only works for CPUs not GPUs. For GPUs I don't know of any way to optimize for memory.

netpcvnn commented 4 years ago

Thanks. But looks like it's pytorch problem when loading models to GPU.