Closed netpcvnn closed 4 years ago
@netpcvnn
There is an feature called preload_model
. This enables model server to have a single copy of model on the host. You could try that and see how that works for you.
preload_model=true
Thanks @vdantu I have tried use preload_model but I haven't seen the difference. So each model occupied about 1 GB GPU memory that is independent on model size. Actually, I dont understand much about preload_model feature and how it works.
you tried with preload_model = true
? what version of MMS are you running?
Yes, I have tried the config.properties with preload_model=true
.
The version of MMS I used: 1.0.8 with pip install.
My config file as follow and GPU memory usage with 3 models:
@netpcvnn : I don't think this feature was in 1.0.8. It must have gone in with the latest version of MMS . I think this was packaged as multi-model-server,
pip install multi-model-server
@alexwong : Please feel free to correct me :)
Thanks for your helps. I have checked the MMS version again. The version is 1.0.8.1.
@netpcvnn : If its not working with 1.0.8.1, could you try once with the pre release?
pip install --pre mxnet-model-server
Hi, I tried installing pre-release version. The version now is 1.0.9b20191115. But I haven't seen the difference. The GPU memory is still the same.
ohh. I might have missed the bigger question. This feature only works for CPUs not GPUs. For GPUs I don't know of any way to optimize for memory.
Thanks. But looks like it's pytorch problem when loading models to GPU.
Hi all,
I would like to serve multiple models but look like each model (service) consumes a lot of memory for example (Model size 70MB but it consumes about 1GB GPU memory). Do we have any ways to reduce the GPU memory?