Open AndriiNeverov opened 4 years ago
@AndriiNeverov I think the solution for this is to add an envoyfilter in the namespace that is originating the traffic and on the outbound traffic add envoy timeout headers to the desired value.
Resource similar to this would work:
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: nginx-lua-filter
namespace: nginx-ingress
spec:
filters:
- filterConfig:
inlineCode: |
function envoy_on_request(request_handle)
request_handle:headers():add("x-envoy-upstream-rq-timeout-ms", "120000")
end
filterName: envoy.lua
filterType: HTTP
listenerMatch:
listenerType: SIDECAR_OUTBOUND
workloadLabels:
app: nginx-ingress
I have a Kubeflow 0.7.1 cluster setup using https://www.kubeflow.org/docs/started/k8s/kfctl-existing-arrikto/ and apply seldon.io/rest-read-timeout, seldon.io/rest-connection-timeout, seldon.io/grpc-read-timeout annotations to set the timeout to 30 sec.
It works perfectly fine when I call 'predict' from outside of the cluster. However, when I call within (e.g. from a Jupyter notebook) it fails (HTTP Status and time highlighted):
[2020-02-03T23:54:14.447Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 200 - "-" 168 381 30028 30026 "10.233.74.1" "python-requests/2.22.0" "84b462d5-f2d0-9481-9eb5-26e822375958" "10.50.8.102" "127.0.0.1:8000" inbound|8000|http|seldon-b3bd70ca9777516558eba158a9f106f0.aneverov.svc.cluster.local - 10.233.69.224:8000 10.233.74.1:0 -
vs
[2020-02-03T23:49:52.035Z] "POST /seldon/aneverov/server-78190d6619e14653926768f60a016848/api/v0.1/predictions HTTP/1.1" 504 UT "-" 168 24 15001 - "-" "python-requests/2.22.0" "18db807f-cf01-9d3a-9c55-912c58382796" "10.50.8.102" "10.50.8.102:80" PassthroughCluster - 10.50.8.102:80 10.233.73.217:38208 -
The difference is about it taking a different route (e.g. PassthroughCluster).
There are some mentions of the "magic" 15 sec timeout (https://github.com/istio/istio/issues/16915#issuecomment-529210672, https://github.com/istio/istio/issues/1888), but I haven't found a working solution yet.