BlueBrain / Search

Blue Brain text mining toolbox for semantic search and structured information extraction
https://blue-brain-search.readthedocs.io
GNU Lesser General Public License v3.0
40 stars 10 forks source link

Deploy embedding model on Kubernetes using Seldom #623

Open FrancescoCasalegno opened 1 year ago

FrancescoCasalegno commented 1 year ago

Context

Actions

drsantos89 commented 1 year ago

Seldon-core seems to be the most recommended tool to deploy ML models in Kubernetes (1st google results and 3.3k+ starts on GitHub). https://www.datarevenue.com/en-blog/why-you-need-a-model-serving-tool-such-as-seldon

Other options are available: https://medium.com/everything-full-stack/machine-learning-model-serving-overview-c01a6aa3e823

Deploy the model as a Flask App: https://opensource.com/article/20/9/deep-learning-model-kubernetes Or using FastAPI (better than Flask!?): https://betterprogramming.pub/3-reasons-to-switch-to-fastapi-f9c788d017e5

BentoML / Yatai: https://github.com/bentoml/BentoML (3.9k+ stars) https://github.com/bentoml/Yatai (300+ stars)

Flask and FastAPI might not be a good solution as they do not scale wheel and might have performance issues. I'm currently testing Seldon and Yatai.

drsantos89 commented 1 year ago

The default model for sentence embedding was deployed on a local Seldon server using the configuration below:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: minilm
  namespace: seldon
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: feature-extraction
      - name: pretrained_model
        type: STRING
        value: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
    name: default
    replicas: 1

A request to the model can be sent using the bluesearch.k8s.embeddings.embed_seldon function.

drsantos89 commented 1 year ago

Screenshot 2022-09-30 at 14 09 04 The average response time is 74±70ms