SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.39k stars 831 forks source link

Metadata.yaml does not work with tensorflow prepackaged server / seldon protocol #3797

Closed jacobmalmberg closed 1 year ago

jacobmalmberg commented 2 years ago

Describe the bug

Placing a metadata.yaml file with metadata information about the model in the model s3 bucket does not work when using the prepackaged tensorflow server and the seldon protocol. When exectuing curl service:/api/v1.0/metadata | jq . this metadata (see "to reproduce" below for exact yaml) should be presented but instead I get

{
  "name": "default",
  "models": {
    "mnist-model": {
      "name": "seldonio/tfserving-proxy",
      "versions": [
        "1.12.0-dev"
      ],
      "inputs": [],
      "outputs": []
    }
  },
  "graphinputs": [],
  "graphoutputs": []
}

Metadata is not imported from metadata.yaml but are seemingly taken from the image name of the model container (seldonio/tfserving-proxy:1.12.0-dev). According to https://docs.seldon.io/projects/seldon-core/en/latest/referenceapis/metadata.html#prepackaged-model-servers, the metadata presented should be from metadata.yaml.

To reproduce

I run the mnist example from https://docs.seldon.io/projects/seldon-core/en/latest/servers/tensorflow.html with an extra metadata.yaml file in the bucket.

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: tfserving
spec:
  name: mnist
  predictors:
  - graph:
      children: []
      implementation: TENSORFLOW_SERVER
      modelUri: s3://seldon-models/tfserving/mnist-model-with-metadata
      storageInitializerImage: r-clone-with-org-certs:latest
      name: mnist-model
      parameters:
        - name: signature_name
          type: STRING
          value: predict_images
        - name: model_name
          type: STRING
          value: mnist-model
    name: default
    replicas: 1

Metadata.yaml

name: mnist-model
versions: [1]
platform: tensorflow

Expected behaviour

curl mnist-model-default:8000/api/v1.0/metadata | jq . should yield

{
  "name": "default",
  "models": {
    "mnist-model": {
      "name": "mnist-model",
      "platform": "tensorflow",
      "versions": [
        "1"
      ],
      "inputs": [],
      "outputs": []
    }
  },
  "graphinputs": [],
  "graphoutputs": []
}

Environment

Seldon 1.12.0-dev

value: docker.io/seldonio/engine:1.12.0-dev value: seldonio/seldon-core-executor:1.12.0-dev image: seldonio/seldon-core-operator:1.12.0-dev

Model Details

kubectl logs tfserving-mnist-default-0-mnist-model-54cb7954f9-cst2r -c mnist-model                                                                                                      
starting microservice
2021-12-09 08:27:06.634624: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-09 08:27:06.634664: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-12-09 08:27:09,288 - seldon_core.microservice:main:211 - INFO:  Starting microservice.py:main
2021-12-09 08:27:09,288 - seldon_core.microservice:main:212 - INFO:  Seldon Core version: 1.12.0-dev
2021-12-09 08:27:09,291 - seldon_core.microservice:main:367 - INFO:  Parse JAEGER_EXTRA_TAGS []
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation cni.projectcalico.org/containerID:92ef0cf2ff16b665f0f2057a1f901396bdc27c6072898eb422e0067d0ae93d48
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation cni.projectcalico.org/podIP:10.42.8.71/32
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation cni.projectcalico.org/podIPs:10.42.8.71/32
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation kubernetes.io/config.seen:2021-12-09T08:27:02.472518348Z
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation kubernetes.io/config.source:api
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation prometheus.io/path:/prometheus
2021-12-09 08:27:09,291 - seldon_core.microservice:load_annotations:163 - INFO:  Found annotation prometheus.io/scrape:true
2021-12-09 08:27:09,291 - seldon_core.microservice:main:370 - INFO:  Annotations: {'cni.projectcalico.org/containerID': '92ef0cf2ff16b665f0f2057a1f901396bdc27c6072898eb422e0067d0ae93d48', 'cni.projectcalico.org/podIP': '10.42.8.71/32', 'cni.projectcalico.org/podIPs': '10.42.8.71/32', 'kubernetes.io/config.seen': '2021-12-09T08:27:02.472518348Z', 'kubernetes.io/config.source': 'api', 'prometheus.io/path': '/prometheus', 'prometheus.io/scrape': 'true'}
2021-12-09 08:27:09,291 - seldon_core.microservice:main:374 - INFO:  Importing TfServingProxy
2021-12-09 08:27:09,322 - seldon_core.microservice:main:463 - INFO:  REST gunicorn microservice running on port 9000
2021-12-09 08:27:09,323 - seldon_core.microservice:main:557 - INFO:  REST metrics microservice running on port 6000
2021-12-09 08:27:09,323 - seldon_core.microservice:main:567 - INFO:  Starting servers
2021-12-09 08:27:09,332 - seldon_core.microservice:grpc_prediction_server:520 - INFO:  GRPC Server Binding to '%s' 0.0.0.0:9500 with 1 processes
2021-12-09 08:27:09,335 - seldon_core.microservice:rest_prediction_server:448 - INFO:  Gunicorn Config:  {'bind': '0.0.0.0:9000', 'accesslog': None, 'loglevel': 'info', 'timeout': 5000, 'threads': 1, 'workers': 1, 'max_requests': 0, 'max_requests_jitter': 0, 'post_worker_init': <function post_worker_init at 0x7f4f755a3320>, 'worker_exit': functools.partial(<function worker_exit at 0x7f4f75530f80>, seldon_metrics=<seldon_core.metrics.SeldonMetrics object at 0x7f4f752be590>), 'keepalive': 2}
2021-12-09 08:27:09,339 - seldon_core.wrapper:_set_flask_app_configs:224 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
2021-12-09 08:27:09,339 - seldon_core.microservice:_run_grpc_server:475 - INFO:  Starting new GRPC server with 1.
2021-12-09 08:27:09,342 - seldon_core.wrapper:_set_flask_app_configs:224 - INFO:  App Config:  <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
[2021-12-09 08:27:09 +0000] [108] [INFO] Starting gunicorn 20.1.0
[2021-12-09 08:27:09 +0000] [108] [INFO] Listening at: http://0.0.0.0:6000 (108)
[2021-12-09 08:27:09 +0000] [108] [INFO] Using worker: sync
[2021-12-09 08:27:09 +0000] [127] [INFO] Booting worker with pid: 127
[2021-12-09 08:27:09 +0000] [7] [INFO] Starting gunicorn 20.1.0
[2021-12-09 08:27:09 +0000] [7] [INFO] Listening at: http://0.0.0.0:9000 (7)
[2021-12-09 08:27:09 +0000] [7] [INFO] Using worker: sync
[2021-12-09 08:27:09 +0000] [134] [INFO] Booting worker with pid: 134
2021-12-09 08:27:09,371 - seldon_core.gunicorn_utils:load:103 - INFO:  Tracing not active
ukclivecox commented 2 years ago

The prepacked Tensorflow server with the Seldon protocol uses a proxy.

I think we would need to extend this to ensure the metadata is handled by the proxy. It may not presently have access to the downloaded artifacts.

RafalSkolasinski commented 2 years ago

Access to artifacts may be one thing, but we should also extend TfServingProxy class to implement init_metadata like we have here for example https://github.com/SeldonIO/seldon-core/blob/master/servers/sklearnserver/sklearnserver/SKLearnServer.py#L53-L66

RafalSkolasinski commented 2 years ago

SKLearnServer server do get model_uri as one of the parameters, the TfServingProxy does not.

ukclivecox commented 1 year ago

Closing