Lightning-AI / LitServe

Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.
https://lightning.ai/docs/litserve
Apache License 2.0
2.5k stars 158 forks source link

How to load models from S3 #329

Closed AbhishekBose closed 1 month ago

AbhishekBose commented 1 month ago

Hello Team, We want to run litserve on a single machine. The catch being we want to load models from S3. The model file path and the model name needs to be read from a config file. Want to understand if there is a way to enable such a model registry in order to register the model for inference without having to explicitly hardcode the path in the inference class itself?

aniketmaurya commented 1 month ago

hi @AbhishekBose, you can read the config file and load the model in setup method.

import litserve as ls

class CustomAPI(ls.LitAPI):
    def setup(self, device):
        model_path = read_config()
        self.model = load_model(model_path)

    ...

Please let me know if this answers your question?

AbhishekBose commented 1 month ago

@aniketmaurya If I have to register a new model at runtime? Is that possible or do I have to restart the server every time I onboard a new model or a new version of the same model? I am mentioning from a context of torchserve where we can register the models and their versions with the number of workers specified through a register model api at runtime.

aniketmaurya commented 1 month ago

@AbhishekBose LitServe currently doesn't support updating model without interrupting the runtime.

Trying to understand the use case in a real world production scenario - Usually when you deploy the model, to update it people generally use an orchestrator like Kubernetes or you can serve on Lightning too which takes care of this.

Would be really helpful if you can elaborate how you serve the model?

AbhishekBose commented 1 month ago

@aniketmaurya Currently we serve pythonic workflows as a sidecar application to our main ML service platform application. In that regard, it becomes a difficult to redeploy updates every time there's a change. We were thinking if would be possible to register the serving class in some manner at runtime itself. This makes the model deployment experience completely self served for the Data scientists in concern. They can test just the model on their local machine and the push it to the server.

cyberluke commented 1 month ago

@aniketmaurya it is quite simple in PyTorch to unload model: https://github.com/oobabooga/text-generation-webui/blob/d1af7a41ade7bd3c3a463bfa640725edb818ebaf/modules/models.py#L391

aniketmaurya commented 1 month ago

@aniketmaurya Currently we serve pythonic workflows as a sidecar application to our main ML service platform application. In that regard, it becomes a difficult to redeploy updates every time there's a change. We were thinking if would be possible to register the serving class in some manner at runtime itself. This makes the model deployment experience completely self served for the Data scientists in concern. They can test just the model on their local machine and the push it to the server.

@AbhishekBose you can use a callback to detect file change and reload the model.

@cyberluke LitServe is a generic serving framework and it is not limited to a particular ML library like PyTorch.