SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.35k stars 831 forks source link

Allow ambassador from other namespace to access SeldonDeployment #279

Closed ChenyuanZ closed 5 years ago

ChenyuanZ commented 5 years ago

Hi there,

ambassador is a popular kubernetes-native API gateway. I would like to use an ambassador installed by k8s admin, which locates at a different namespace from seldon-core namespace, to route traffic to the SeldonDeployments. This is to reduce the number of ambassadors we install in the cluster.

To support this feature, we can simply add namespace in the service annotation generated by Seldon cluster manager. I have manually tested this change and it works. E.g.

metadata:
  annotations:
    getambassador.io/config: |
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name:  seldon_seldon-deployment-output-transformer_rest_mapping
      prefix: /seldon/seldon-deployment-output-transformer/
      service: test-output-transformer.seldon:8000
      timeout_ms: 3000
      ---
      apiVersion: ambassador/v0
      kind:  Mapping
      name:  seldon-deployment-output-transformer_grpc_mapping
      grpc: true
      prefix: /seldon.protos.Seldon/
      rewrite: /seldon.protos.Seldon/
      headers:
        seldon: seldon-deployment-output-transformer
      service: test-output-transformer.seldon:5001
      timeout_ms: 3000

This requires source code change at https://github.com/SeldonIO/seldon-core/blob/master/cluster-manager/src/main/java/io/seldon/clustermanager/k8s/SeldonDeploymentOperatorImpl.java#L502-L519

Reference: https://www.getambassador.io/reference/mappings/#namespaces-and-mappings

Best, Chenyuan

ukclivecox commented 5 years ago

This makes sense. Will it work if Ambassador is running inside the namespace referred to? If so sounds like a great change.

ChenyuanZ commented 5 years ago

Hi @cliveseldon

Will it work if Ambassador is running inside the namespace referred to?

Yes. It still works if Ambassador is running inside the same namespace.

cluster-manager log verified that the service created with seldon namespace in it:

2018-11-04 21:18:26.741 DEBUG 7 --- [pool-1-thread-1] i.s.c.k.SeldonDeploymentControllerImpl   : Created service:{
  "metadata": {
    "name": "test-output-transformer",
    "generateName": "",
    "namespace": "seldon",
    "selfLink": "/api/v1/namespaces/seldon/services/test-output-transformer",
    "uid": "2b518f30-e077-11e8-b574-000c293741a7",
    "resourceVersion": "833205",
    "generation": 0,
    "creationTimestamp": "2018-11-04T21:18:26Z",
    "labels": {
      "seldon-app": "test-output-transformer",
      "seldon-deployment-id": "test-output-transformer"
    },
    "annotations": {
      "getambassador.io/config": "---\napiVersion: ambassador/v0\nkind:  Mapping\nname:  seldon_seldon-deployment-output-transformer_rest_mapping\nprefix: /seldon/seldon-deployment-output-transformer/\nservice: test-output-transformer.seldon:8000\ntimeout_ms: 3000\n---\napiVersion: ambassador/v0\nkind:  Mapping\nname:  seldon-deployment-output-transformer_grpc_mapping\ngrpc: true\nprefix: /seldon.protos.Seldon/\nrewrite: /seldon.protos.Seldon/\nheaders:\n  seldon: seldon-deployment-output-transformer\nservice: test-output-transformer.seldon:5001\ntimeout_ms: 3000\n"
    },
    "ownerReferences": [{
      "kind": "SeldonDeployment",
      "name": "seldon-deployment-output-transformer",
      "uid": "28054519-e077-11e8-b574-000c293741a7",
      "apiVersion": "machinelearning.seldon.io/v1alpha2",
      "controller": true
    }],
    "clusterName": ""
  },
  "spec": {
    "ports": [{
      "name": "http",
      "protocol": "TCP",
      "port": 8000,
      "targetPort": "",
      "nodePort": 0
    }, {
      "name": "grpc",
      "protocol": "TCP",
      "port": 5001,
      "targetPort": "",
      "nodePort": 0
    }],
    "selector": {
      "seldon-app": "test-output-transformer"
    },
    "clusterIP": "10.111.1.102",
    "type": "ClusterIP",
    "sessionAffinity": "None",
    "loadBalancerIP": "",
    "externalName": "",
    "externalTrafficPolicy": "",
    "healthCheckNodePort": 0
  },
  "status": {
    "loadBalancer": {
    }
  }
}

Now port forward seldon namespace ambassador and talk to it:

$ kubectl port-forward $(kubectl get pods -n seldon -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n seldon 8002:8080
Forwarding from 127.0.0.1:8002 -> 8080
Handling connection for 8002
>>> response = requests.post("http://localhost:8002/seldon/seldon-deployment-output-transformer/api/v0.1/predictions", json=payload)
>>> response.text
'{\n  "meta": {\n    "puid": "bjuq36q9ecunr53m8rmhumcqhn",\n    "tags": {\n    },\n    "routing": {\n      "output-transformer": -1\n    },\n    "requestPath": {\n      "classifier": "seldonio/mock_classifier:1.0",\n      "output-transformer": "seldonio/output_transformer:0.1"\n    }\n  },\n  "data": {\n    "names": ["proba"],\n    "tensor": {\n      "shape": [2, 1],\n      "values": [0.07577603016695865, 1.0]\n    }\n  }\n}'

Log of ambassador installed by seldon-core helm chart:

$ kubectl logs -f seldon-core-ambassador-7fb4575f6b-r8bkr -n seldon -c ambassador
2018-11-04 21:17:11 kubewatch 0.35.1 INFO: generating config with gencount 1 (1 change)
/usr/lib/python3.6/site-packages/pkg_resources/__init__.py:1298: UserWarning: /ambassador is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
2018-11-04 21:17:11 kubewatch 0.35.1 INFO: Scout reports {"latest_version": "0.40.1", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1541366231.528789}
[2018-11-04 21:17:12.169][17][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-11-04 21:17:12.170][17][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-11-04 21:17:12.176][17][info][config] source/server/configuration_impl.cc:87] loading tracing configuration
[2018-11-04 21:17:12.176][17][info][config] source/server/configuration_impl.cc:109] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 21:diagd 22:envoy 23:kubewatch
starting hot-restarter with target: /ambassador/start-envoy.sh
forking and execing new child process at epoch 0
forked new child process with PID=24
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:181] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:183] statically linked extensions:
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:185]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:188]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash,extauth
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:191]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:194]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:196]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:198]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:201]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
[2018-11-04 21:17:12.398][24][info][main] source/server/server.cc:204]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
[2018-11-04 21:17:12.467][24][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-11-04 21:17:12.467][24][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-11-04 21:17:12.474][24][info][config] source/server/configuration_impl.cc:87] loading tracing configuration
[2018-11-04 21:17:12.475][24][info][config] source/server/configuration_impl.cc:109] loading stats sink configuration
[2018-11-04 21:17:12.475][24][info][main] source/server/server.cc:376] all clusters initialized. initializing init manager
[2018-11-04 21:17:12.475][24][info][config] source/server/listener_manager_impl.cc:781] all dependencies initialized. starting workers
[2018-11-04 21:17:12.475][24][info][main] source/server/server.cc:396] starting main dispatch loop
/usr/lib/python3.6/site-packages/pkg_resources/__init__.py:1298: UserWarning: /ambassador is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
2018-11-04 21:17:14 diagd 0.35.1 [P21TMainThread] INFO: thread count 9, listening on 0.0.0.0:8877
[2018-11-04 21:17:14 +0000] [21] [INFO] Starting gunicorn 19.8.1
[2018-11-04 21:17:14 +0000] [21] [INFO] Listening at: http://0.0.0.0:8877 (21)
[2018-11-04 21:17:14 +0000] [21] [INFO] Using worker: threads
[2018-11-04 21:17:14 +0000] [59] [INFO] Booting worker with pid: 59
2018-11-04 21:17:14 diagd 0.35.1 [P59TMainThread] INFO: Starting periodic updates
[2018-11-04 21:17:22.477][24][info][main] source/server/drain_manager_impl.cc:63] shutting down parent after drain
2018-11-04 21:17:28 kubewatch 0.35.1 INFO: generating config with gencount 2 (1 change)
/usr/lib/python3.6/site-packages/pkg_resources/__init__.py:1298: UserWarning: /ambassador is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
2018-11-04 21:17:29 kubewatch 0.35.1 INFO: Scout reports {"latest_version": "0.40.1", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1541366248.824498}
[2018-11-04 21:17:29.379][62][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-11-04 21:17:29.379][62][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-11-04 21:17:29.386][62][info][config] source/server/configuration_impl.cc:87] loading tracing configuration
[2018-11-04 21:17:29.386][62][info][config] source/server/configuration_impl.cc:109] loading stats sink configuration
got SIGHUP
forking and execing new child process at epoch 1
forked new child process with PID=66
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:181] initializing epoch 1 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:183] statically linked extensions:
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:185]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:188]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash,extauth
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:191]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:194]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:196]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:198]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:201]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
[2018-11-04 21:17:29.407][66][info][main] source/server/server.cc:204]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
[2018-11-04 21:17:29.414][24][warning][main] source/server/server.cc:447] shutting down admin due to child startup
[2018-11-04 21:17:29.414][24][warning][main] source/server/server.cc:455] terminating parent process
[2018-11-04 21:17:29.418][66][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-11-04 21:17:29.418][66][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-11-04 21:17:29.426][66][info][config] source/server/configuration_impl.cc:87] loading tracing configuration
[2018-11-04 21:17:29.426][66][info][config] source/server/configuration_impl.cc:109] loading stats sink configuration
[2018-11-04 21:17:29.427][66][info][main] source/server/server.cc:376] all clusters initialized. initializing init manager
[2018-11-04 21:17:29.427][66][info][config] source/server/listener_manager_impl.cc:781] all dependencies initialized. starting workers
[2018-11-04 21:17:29.427][66][info][main] source/server/server.cc:396] starting main dispatch loop
[2018-11-04 21:17:29.427][24][info][main] source/server/server.cc:97] closing and draining listeners
[2018-11-04 21:17:39.429][66][info][main] source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-11-04 21:17:39.429][24][info][main] source/server/hot_restart_impl.cc:435] shutting down due to child request
[2018-11-04 21:17:39.429][24][warning][main] source/server/server.cc:346] caught SIGTERM
[2018-11-04 21:17:39.429][24][info][main] source/server/server.cc:400] main dispatch loop exited
[2018-11-04 21:17:39.457][24][info][main] source/server/server.cc:435] exiting
got SIGCHLD
PID=24 exited with code=0
2018-11-04 21:18:29 kubewatch 0.35.1 INFO: generating config with gencount 3 (1 change)
2018-11-04 21:18:29 kubewatch 0.35.1 INFO: Scout reports {"latest_version": "0.40.1", "application": "ambassador", "notices": [], "cached": true, "timestamp": 1541366248.824498}
[2018-11-04 21:18:29.597][92][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-11-04 21:18:29.597][92][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-11-04 21:18:29.608][92][info][config] source/server/configuration_impl.cc:87] loading tracing configuration
[2018-11-04 21:18:29.608][92][info][config] source/server/configuration_impl.cc:109] loading stats sink configuration
got SIGHUP
forking and execing new child process at epoch 2
forked new child process with PID=96
[2018-11-04 21:18:29.638][96][info][main] source/server/server.cc:181] initializing epoch 2 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
[2018-11-04 21:18:29.638][96][info][main] source/server/server.cc:183] statically linked extensions:
[2018-11-04 21:18:29.638][96][info][main] source/server/server.cc:185]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-11-04 21:18:29.638][96][info][main] source/server/server.cc:188]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash,extauth
[2018-11-04 21:18:29.640][96][info][main] source/server/server.cc:191]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-11-04 21:18:29.643][96][info][main] source/server/server.cc:194]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2018-11-04 21:18:29.643][96][info][main] source/server/server.cc:196]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd
[2018-11-04 21:18:29.643][96][info][main] source/server/server.cc:198]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-11-04 21:18:29.643][96][info][main] source/server/server.cc:201]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
[2018-11-04 21:18:29.643][96][info][main] source/server/server.cc:204]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
[2018-11-04 21:18:29.652][66][warning][main] source/server/server.cc:447] shutting down admin due to child startup
[2018-11-04 21:18:29.655][66][warning][main] source/server/server.cc:455] terminating parent process
[2018-11-04 21:18:29.666][96][info][config] source/server/configuration_impl.cc:53] loading 1 listener(s)
[2018-11-04 21:18:29.677][96][info][config] source/server/configuration_impl.cc:87] loading tracing configuration
[2018-11-04 21:18:29.677][96][info][config] source/server/configuration_impl.cc:109] loading stats sink configuration
[2018-11-04 21:18:29.678][96][info][main] source/server/server.cc:396] starting main dispatch loop
[2018-11-04 21:18:29.678][96][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-11-04 21:18:29.678][96][info][main] source/server/server.cc:376] all clusters initialized. initializing init manager
[2018-11-04 21:18:29.678][96][info][config] source/server/listener_manager_impl.cc:781] all dependencies initialized. starting workers
[2018-11-04 21:18:29.682][66][info][main] source/server/server.cc:97] closing and draining listeners
[2018-11-04 21:18:39.680][96][info][main] source/server/drain_manager_impl.cc:63] shutting down parent after drain
[2018-11-04 21:18:39.680][66][info][main] source/server/hot_restart_impl.cc:435] shutting down due to child request
[2018-11-04 21:18:39.680][66][warning][main] source/server/server.cc:346] caught SIGTERM
[2018-11-04 21:18:39.680][66][info][main] source/server/server.cc:400] main dispatch loop exited
[2018-11-04 21:18:39.681][66][info][main] source/server/server.cc:435] exiting
got SIGCHLD
PID=66 exited with code=0
ACCESS [2018-11-04T21:19:53.838Z] "POST /seldon/seldon-deployment-output-transformer/api/v0.1/predictions HTTP/1.1" 200 - 153 407 147 145 "-" "python-requests/2.19.1" "fffb7f1a-e415-4300-bff6-2580ddb31f55" "localhost:8002" "10.111.1.102:8000"
ukclivecox commented 5 years ago

Great. Looks ok then.