SeldonIO / seldon-operator

Seldon Core Operator for Kubernetes
Apache License 2.0
12 stars 10 forks source link

Timeout annotations are not respected when calling REST endpoint within Istio env #70

Open AndriiNeverov opened 4 years ago

AndriiNeverov commented 4 years ago

I have a Kubeflow 0.7.1 cluster setup using https://www.kubeflow.org/docs/started/k8s/kfctl-existing-arrikto/ and apply seldon.io/rest-read-timeout, seldon.io/rest-connection-timeout, seldon.io/grpc-read-timeout annotations to set the timeout to 30 sec.

It works perfectly fine when I call 'predict' from outside of the cluster. However, when I call within (e.g. from a Jupyter notebook) it fails (HTTP Status and time highlighted):

[2020-02-03T23:54:14.447Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 200 - "-" 168 381 30028 30026 "10.233.74.1" "python-requests/2.22.0" "84b462d5-f2d0-9481-9eb5-26e822375958" "10.50.8.102" "127.0.0.1:8000" inbound|8000|http|seldon-b3bd70ca9777516558eba158a9f106f0.aneverov.svc.cluster.local - 10.233.69.224:8000 10.233.74.1:0 -

vs

[2020-02-03T23:49:52.035Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 504 UT "-" 168 24 15001 - "-" "python-requests/2.22.0" "18db807f-cf01-9d3a-9c55-912c58382796" "10.50.8.102" "10.50.8.102:80" PassthroughCluster - 10.50.8.102:80 10.233.73.217:38208 -

The difference is about it taking a different route (e.g. PassthroughCluster).

There are some mentions of the "magic" 15 sec timeout (https://github.com/istio/istio/issues/16915#issuecomment-529210672, https://github.com/istio/istio/issues/1888), but I haven't found a working solution yet.

jahantech commented 3 years ago

@AndriiNeverov I think the solution for this is to add an envoyfilter in the namespace that is originating the traffic and on the outbound traffic add envoy timeout headers to the desired value.

Resource similar to this would work:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: nginx-lua-filter
  namespace: nginx-ingress
spec:
  filters:
  - filterConfig:
      inlineCode: |
        function envoy_on_request(request_handle)
            request_handle:headers():add("x-envoy-upstream-rq-timeout-ms", "120000")
        end
    filterName: envoy.lua
    filterType: HTTP
    listenerMatch:
      listenerType: SIDECAR_OUTBOUND
  workloadLabels:
    app: nginx-ingress