SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
724 stars 183 forks source link

Heterogeneous pool of workers #975

Open adriangonz opened 1 year ago

adriangonz commented 1 year ago

Support a Heterogeneous pool of workers with variable number of model replicas per worker

ascillitoe commented 1 year ago

Heterogeneous workers would also be beneficial for the recently added support for online drift detectors (https://github.com/SeldonIO/MLServer/pull/1108), since these detectors must currently be run with parallel_workers = 0.

cristiancl25 commented 10 months ago

Hello, Is there any estimate of when this issue would be addressed? Is there any intention to resolve it for version 1.4? Thanks.

sakoush commented 10 months ago

It is unlikely this is going to be addressed in the next release as it stands. Do you have a particular usecase that requires it that you could share?

cristiancl25 commented 10 months ago

Hello, the main problem with the actual parallel workers is the memory consumption when loading all models. If we can redistribute the models on the workers this could be improved.