cortexlabs / cortex

Production infrastructure for machine learning at scale
https://cortexlabs.com/
Apache License 2.0
8.02k stars 606 forks source link

Serve a collection of custom models based on LRU #619

Closed vishalbollu closed 3 years ago

vishalbollu commented 4 years ago

Description

Add support to serve many different models where each model fulfills a subset of possible input (i.e. city-based models). Because each model is designed for only a subset of input queries, certain models may be queried more often than others. Serve the top N most queried models. Load and unload models based on LRU.

Here are the different use cases that could be handled:

Implementation

cron:

  1. update tree
  2. for each model in memory, unload it if not it not in tree
  3. for each model in memory which has a latest timestamp: if there is a new version && (timestamp on latest is newer than oldest timestamp currently in cache or the cache has space): download it and load in memory

request:

python:

Open Questions

config questions

Notes

Additional Context

RobertLucian commented 4 years ago

Grabbing this one. Part of the trick in making this work will be in reloading and unloading the model configs for the Tensorflow Predictor on-the-fly and reliably. I reckon things will be simpler for the ONNX and Python Predictors. This one goes hand-in-hand with #890.