We have many pre trained model from tfhub which does not required any thing other than serving directly using any serving layer.To try the same I have used seldon core serving technique to make it available for serving layer.I tried to access the api by

curl --location --request POST 'https://{serving_url}/seldon/seldon/embedding/api/v1.0/predictions' \
--header 'Content-Type: application/json' \
--data-raw '{
    "data": {
        "ndarray": [
            "testing service"
        ]
    }
}'

however it error out with

{"status":{"code":-1,"info":"HTTPConnectionPool(host='0.0.0.0', port=2001): Max retries exceeded with url: /v1/models/embedding:predict (Caused by NewConnectionError('\u003curllib3.connection.HTTPConnection object at 0x7f69aeff2b10\u003e: Failed to establish a new connection: [Errno 111] Connection refused'))","reason":"MICROSERVICE_INTERNAL_ERROR","status":1}}

To reproduce

I have used multiple techniques below are the details:

This is where I have downloaded model from tfhub and serve that using below yaml

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: embedding
namespace: seldon
spec:
name: embedding
predictors:
- graph:
  serviceAccountName: poc-seldon-sa
  name: embedding
  type: MODEL
  implementation: TENSORFLOW_SERVER
  modelUri: gs://tf_models_test/embedding/dan/4
  parameters:
    - name: model_input
      type: STRING
      value: text
    - name: model_output
      type: STRING
      value: embedding
  endpoint:
    type: GRPC
    type: REST
name: embedding
replicas: 1
labels:
  nodepool: general

Option 2 I used to serve it by custom model mentioned here Embedding.py


from seldon_core.user_model import SeldonResponse
import tensorflow as tf
import tensorflow_hub as hub
import logging

DAN_MODEL_URI = "https://tfhub.dev/google/universal-sentence-encoder/4"

class Embedding(object): """ Model template. You can load your model parameters in init from a location accessible at runtime """

def __init__(self):
    """
    Add any initialization parameters. These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.
    """
    self._model = hub.load(DAN_MODEL_URI)

def predict(self, X, features_names=None, meta={}):
    logging.info(f"model meta: {meta}")
    embedding = self._model([X]).numpy().tolist()[0]

    return SeldonResponse(data=embedding)

def init_metadata(self):

    meta = {
        "name": "embedding",
        "versions": ["dan4"],
        "platform": "seldon",
        "inputs": [
            {
                "messagetype": "text",
            }
        ],
        "outputs": [{"messagetype": "tensor", "schema": {"shape": [512]}}],
        "custom": {
            "author": "sophi-dev"
        }
    }
    return meta


requirements.txt:

seldon_core==1.14.1 numpy==1.23.4 tensorflow==2.8.3 tensorflow-hub==0.12.0


Dockerfile:

FROM python:3.9-slim

ARG TF_CACHE_DIR="/var/tmp/tfhub_modules"

WORKDIR /app

Install python packages

COPY requirements.txt requirements.txt

RUN pip install -r requirements.txt

RUN apt-get -y update && \ apt-get install -y screen && \ apt-get install -y curl && \ apt-get install -y wget && \ apt-get install -y tar

Copy source code

COPY . .

RUN mkdir -p /app${TF_CACHE_DIR} && \ chmod -R 777 /app

Define environment variables

ENV MODEL_NAME=Embedding ENV SERVICE_TYPE=MODEL ENV CUDA_VISIBLE_DEVICES=-1 ENV TFHUB_CACHE_DIR=${TF_CACHE_DIR}

RUN python -c "import tensorflow as tf; import tensorflow_hub as hub; hub.load('https://tfhub.dev/google/universal-sentence-encoder/4')" RUN ls -l $TFHUB_CACHE_DIR

Port for GRPC

EXPOSE 5001

Port for REST

EXPOSE 8000

Changing folder to default user

RUN chown -R 8888 /app

CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE


embedding_yaml:

apiVersion: machinelearning.seldon.io/v1 kind: SeldonDeployment metadata: name: embedding namespace: seldon spec: name: embedding predictors:

componentSpecs:
- spec: containers:
  - image: registry.hub.docker.com/aiml/ml_models/embedding:latest name: embedding resources: requests: memory: 1000Mi cpu: 1000m limits: memory: 8000Mi cpu: 4000m imagePullPolicy: Always imagePullSecrets:
  - name: docker-regcred graph: serviceAccountName: poc-seldon-sa name: embedding type: MODEL endpoint: type: GRPC type: REST name: embedding replicas: 1

Environment

Seldon 1.14.1

Cloud Provider: AWS Kubernetes Cluster Version v1.21.1

SeldonIO / seldon-core

Pre-trained models like universal-sentence-encoder from tensorflow hub having issue while serving #4400