dapr / dapr

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
https://dapr.io
Apache License 2.0
23.84k stars 1.88k forks source link

Dapr Resiliency Timeout not working when using /v1.0/invoke/ invocation #5602

Closed alecor191 closed 1 year ago

alecor191 commented 1 year ago

In what area(s)?

/area runtime

What version of Dapr?

CLI version: 1.9.1 Runtime version: 1.9.4

Expected Behavior

When performing service invocation with a timeout configured (using Resiliency object), then it shouldn't matter if the request is performed using /v1.0/invoke/<app>/method/... or by setting the dapr-app-id HTTP header.

In both cases the timeout should be applied.

Actual Behavior

Timeout only seems to work if the dapr-app-id HTTP header approach is used. When sending the same request suing the /v1.0/invoke/... approach, the configured timeout doesn't seem to have any effect.

Steps to Reproduce the Problem

Configure an app with a timeout (in my case I configured 1 sec):

apiVersion: dapr.io/v1alpha1
kind: Resiliency
...
spec:
  policies:
    timeouts:
      demoTimeout: 1s
  targets:
    apps:
      my-app:
        timeout: demoTimeout

Then invoke a service with both approaches (I opened a pod shell and ran curl). Here the results:

> time curl -I -X GET -H "dapr-app-id: echo-app" http://localhost:3500/delay?delay_milliseconds=5000

HTTP/1.1 500 Internal Server Error                                      <===== FAILS with 500
Server: fasthttp
Date: Sat, 03 Dec 2022 13:45:52 GMT
Content-Type: application/json
Content-Length: 71
Traceparent: 00-e7f8af10a5d87ccfac6d2b6eb6e01bbb-859fe57196a52e96-01

real    0m1.007s                                                        <===== Completes in 1 sec (= configured timeout)
user    0m0.004s
sys     0m0.000s

> time curl -I -X GET http://localhost:3500/v1.0/invoke/echo-app.apps/method/delay?delay_milliseconds=5000

HTTP/1.1 200 OK                                                         <===== SUCCEEDS with 200
Server: Kestrel
Date: Sat, 03 Dec 2022 13:45:59 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 25
Traceparent: 00-14cfea7bd238c46005fcad1cccc14f84-b1c99e866dd038ff-01

real    0m5.017s                                                        <===== Completes in 5 sec, even though timeout is configured with 1 sec
user    0m0.000s
sys     0m0.010s

Note how in both cases we send a request to the same destination service (/delay?delay_milliseconds=5000 means that the destination service will only respond after 5 sec). Still, we get different behavior although both approaches should be equivalent (as far as I know?).

Release Note

RELEASE NOTE: FIX Updated path invocation to honor resiliency settings.

yaron2 commented 1 year ago

cc @halspang @ItalyPaleAle

ItalyPaleAle commented 1 year ago

@alecor191 would it be possible for you to try and reproduce this with an edge build? These are our "nightlies" that are built from the "master" branch of the repository. I suspect that this kind of issue be related to some race conditions we fixed last week.

If you're able, you can use the :edge tag for Dapr. For example, you can use the daprio/daprd:edge images

alecor191 commented 1 year ago

Thanks @ItalyPaleAle for the hint. Happy to try it out, but I would need some more details on the instructions: Currently I'm installing Dapr using the Dapr Helm Installation Instructions.

By just changing the version to edge I get the following:

failed to pull chart: chart "dapr" version "edge" not found in https://dapr.github.io/helm-charts/ repository

Is there another repository I should use to get the edge builds resp. do I need to use instructions other than the Helm Chart ones mentioned above?

ItalyPaleAle commented 1 year ago

@alecor191 thanks for the quick response!

if you’re already using the Dapr Helm charts, the easiest thing is to just set a different value for the Dapr injector to use. Just set this option when deploying with Helm:

dapr_sidecar_injector.image.name=ghcr.io/dapr/daprd:nightly-2022-12-04-linux-amd64

I’m here using the latest nightly. (Note you must specify the full image name and not just the tag in this case)

alecor191 commented 1 year ago

Nice, thanks. I set the proposed option and I see that now daprd sidecar uses the mentioned image. Unfortunately, when daprd starts up it terminates with error after a couple of seconds with the following:

time="2022-12-05T07:29:59.7242416Z" level=fatal msg="failed to start internal gRPC server: error from authenticator CreateSignedWorkloadCert: error from sentry SignCertificate: rpc error: code = Unknown desc = error validating requester identity: csr validation failed: token field in request must not be empty" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04

Is there anything else I need to configure to make the edge build work?

For context, here the full daprd container log when using edge build:

time="2022-12-05T07:29:59.3647334Z" level=info msg="starting Dapr Runtime -- version nightly-2022-12-04 -- commit 31b83bf81278cb3e511a92e8bd8c9b507ef85991" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.3680605Z" level=info msg="log level set to: debug" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.3706435Z" level=info msg="metrics server started on :9090/" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.metrics type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.3904333Z" level=info msg="Initializing the operator client (config: echo-app)" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.5539559Z" level=debug msg="Found 2 resiliency configurations." app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.5545241Z" level=info msg="Loading Resiliency configuration: echo-app" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.5569653Z" level=debug msg="Resiliency configuration (echo-app): &{TypeMeta:{Kind:Resiliency APIVersion:dapr.io/v1alpha1} ObjectMeta:{Name:echo-app GenerateName: Namespace:apps SelfLink: UID:6fb265c4-c61e-494e-9b6d-983cc399528d ResourceVersion:407644 Generation:1 CreationTimestamp:2022-12-05 07:28:23 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[kubectl.kubernetes.io/last-applied-configuration:{\"apiVersion\":\"dapr.io/v1alpha1\",\"kind\":\"Resiliency\",\"metadata\":{\"annotations\":{},\"name\":\"echo-app\",\"namespace\":\"apps\"},\"spec\":{\"policies\":{\"circuitBreakers\":{\"demoCB\":{\"interval\":\"0s\",\"maxRequests\":1,\"timeout\":\"10s\",\"trip\":\"consecutiveFailures \\u003e= 2\"}},\"retries\":{\"demoRetry\":{\"maxInterval\":\"10s\",\"maxRetries\":5,\"policy\":\"exponential\"}},\"timeouts\":{\"demoTimeout\":\"2s\"}},\"targets\":{\"apps\":{\"echo-app\":{\"timeout\":\"demoTimeout\"}}}}}\n] OwnerReferences:[] Finalizers:[] ManagedFields:[{Manager:Go-http-client Operation:Update APIVersion:dapr.io/v1alpha1 Time:2022-12-05 07:28:23 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:kubectl.kubernetes.io/last-applied-configuration\":{}}},\"f:spec\":{\".\":{},\"f:policies\":{\".\":{},\"f:circuitBreakers\":{\".\":{},\"f:demoCB\":{\".\":{},\"f:interval\":{},\"f:maxRequests\":{},\"f:timeout\":{},\"f:trip\":{}}},\"f:retries\":{\".\":{},\"f:demoRetry\":{\".\":{},\"f:maxInterval\":{},\"f:maxRetries\":{},\"f:policy\":{}}},\"f:timeouts\":{\".\":{},\"f:demoTimeout\":{}}},\"f:targets\":{\".\":{},\"f:apps\":{\".\":{},\"f:echo-app\":{\".\":{},\"f:timeout\":{}}}}}} Subresource:}]} Spec:{Policies:{Timeouts:map[demoTimeout:2s] Retries:map[demoRetry:{Policy:exponential Duration: MaxInterval:10s MaxRetries:0x4000ccdd20}] CircuitBreakers:map[demoCB:{MaxRequests:1 Interval:0s Timeout:10s Trip:consecutiveFailures >= 2}]} Targets:{Apps:map[echo-app:{Timeout:demoTimeout Retry: CircuitBreaker: CircuitBreakerCacheSize:0}] Actors:map[] Components:map[]}} Scopes:[]}" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6111231Z" level=info msg="Loading Resiliency configuration: node-app" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6114893Z" level=debug msg="Resiliency configuration (node-app): &{TypeMeta:{Kind:Resiliency APIVersion:dapr.io/v1alpha1} ObjectMeta:{Name:node-app GenerateName: Namespace:apps SelfLink: UID:5cbfbf88-1a70-4ab2-8622-f6c8df8f45e0 ResourceVersion:407649 Generation:1 CreationTimestamp:2022-12-05 07:28:23 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[kubectl.kubernetes.io/last-applied-configuration:{\"apiVersion\":\"dapr.io/v1alpha1\",\"kind\":\"Resiliency\",\"metadata\":{\"annotations\":{},\"name\":\"node-app\",\"namespace\":\"apps\"},\"spec\":{\"policies\":{\"circuitBreakers\":{\"demoCB\":{\"interval\":\"0s\",\"maxRequests\":1,\"timeout\":\"10s\",\"trip\":\"consecutiveFailures \\u003e= 2\"}},\"retries\":{\"demoRetry\":{\"maxInterval\":\"10s\",\"maxRetries\":5,\"policy\":\"exponential\"}},\"timeouts\":{\"demoTimeout\":\"1s\"}},\"targets\":{\"apps\":{\"node-app\":{\"timeout\":\"demoTimeout\"}}}}}\n] OwnerReferences:[] Finalizers:[] ManagedFields:[{Manager:Go-http-client Operation:Update APIVersion:dapr.io/v1alpha1 Time:2022-12-05 07:28:23 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:kubectl.kubernetes.io/last-applied-configuration\":{}}},\"f:spec\":{\".\":{},\"f:policies\":{\".\":{},\"f:circuitBreakers\":{\".\":{},\"f:demoCB\":{\".\":{},\"f:interval\":{},\"f:maxRequests\":{},\"f:timeout\":{},\"f:trip\":{}}},\"f:retries\":{\".\":{},\"f:demoRetry\":{\".\":{},\"f:maxInterval\":{},\"f:maxRetries\":{},\"f:policy\":{}}},\"f:timeouts\":{\".\":{},\"f:demoTimeout\":{}}},\"f:targets\":{\".\":{},\"f:apps\":{\".\":{},\"f:node-app\":{\".\":{},\"f:timeout\":{}}}}}} Subresource:}]} Spec:{Policies:{Timeouts:map[demoTimeout:1s] Retries:map[demoRetry:{Policy:exponential Duration: MaxInterval:10s MaxRetries:0x400117c2a0}] CircuitBreakers:map[demoCB:{MaxRequests:1 Interval:0s Timeout:10s Trip:consecutiveFailures >= 2}]} Targets:{Apps:map[node-app:{Timeout:demoTimeout Retry: CircuitBreaker: CircuitBreakerCacheSize:0}] Actors:map[] Components:map[]}} Scopes:[]}" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6140027Z" level=info msg="Resiliency configuration loaded." app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6179908Z" level=info msg="kubernetes mode configured" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6181915Z" level=info msg="app id: echo-app" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6338144Z" level=info msg="mTLS enabled. creating sidecar authenticator" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.634256Z" level=info msg="trust anchors and cert chain extracted successfully" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.security type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.634463Z" level=info msg="authenticator created" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6530516Z" level=info msg="Dapr trace sampler initialized: DaprTraceSampler(P=0.000100)" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6538204Z" level=info msg="Initialized name resolution to kubernetes" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6554625Z" level=debug msg="loading component. name: kubernetes, type: secretstores.kubernetes/v1" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6554995Z" level=info msg="loading components" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6688413Z" level=debug msg="found component. name: default-rate-limit, type: middleware.http.ratelimit/v1" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.681195Z" level=info msg="component loaded. name: kubernetes, type: secretstores.kubernetes/v1" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6817041Z" level=debug msg="loading component. name: default-rate-limit, type: middleware.http.ratelimit/v1" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6821008Z" level=info msg="waiting for all outstanding components to be processed" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.682142Z" level=info msg="component loaded. name: default-rate-limit, type: middleware.http.ratelimit/v1" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6825111Z" level=info msg="all outstanding components processed" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6833719Z" level=info msg="gRPC proxy enabled" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6849874Z" level=warning msg="Failed to listen on [::1]:50001 with error: listen tcp [::1]:50001: bind: cannot assign requested address" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.grpc.api type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6855222Z" level=info msg="enabled gRPC tracing middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.grpc.api type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6857712Z" level=info msg="enabled gRPC metrics middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.grpc.api type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6878516Z" level=info msg="API gRPC server is running on port 50001" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6989555Z" level=info msg="enabled metrics http middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.http type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.699206Z" level=info msg="enabled tracing http middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.http type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.6995618Z" level=warning msg="Failed to listen on [::1]:3500 with error: listen tcp [::1]:3500: bind: cannot assign requested address" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.http type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7005669Z" level=info msg="enabled metrics http middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.http type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7007592Z" level=info msg="enabled tracing http middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.http type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7031881Z" level=info msg="http server is running on port 3500" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7033588Z" level=info msg="The request body size parameter is: 4" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7037916Z" level=info msg="enabled gRPC tracing middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.grpc.internal type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7038796Z" level=info msg="enabled gRPC metrics middleware" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.grpc.internal type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7042507Z" level=info msg="sending workload csr request to sentry" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime.grpc.internal type=log ver=nightly-2022-12-04
time="2022-12-05T07:29:59.7242416Z" level=fatal msg="failed to start internal gRPC server: error from authenticator CreateSignedWorkloadCert: error from sentry SignCertificate: rpc error: code = Unknown desc = error validating requester identity: csr validation failed: token field in request must not be empty" app_id=echo-app instance=echo-app-6d649849b4-8kv8m scope=dapr.runtime type=log ver=nightly-2022-12-04

For comparison, here the daprd container logs when using the released 1.9 version (working solution):

time="2022-12-05T07:37:46.3852558Z" level=info msg="starting Dapr Runtime -- version 1.9.5 -- commit f5f847eef8721d85f115729ee9efa820fe7c4cd3" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.3853091Z" level=info msg="log level set to: debug" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.3854227Z" level=info msg="metrics server started on :9090/" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.metrics type=log ver=1.9.5
time="2022-12-05T07:37:46.3855599Z" level=info msg="Initializing the operator client (config: echo-app)" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.394551Z" level=debug msg="Found 2 resiliency configurations." app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.3946008Z" level=info msg="Loading Resiliency configuration: echo-app" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.3947123Z" level=debug msg="Resiliency configuration (echo-app): &{TypeMeta:{Kind:Resiliency APIVersion:dapr.io/v1alpha1} ObjectMeta:{Name:echo-app GenerateName: Namespace:apps SelfLink: UID:2d178a61-37d6-40c8-8f37-973df426848d ResourceVersion:409738 Generation:1 CreationTimestamp:2022-12-05 07:37:44 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[kubectl.kubernetes.io/last-applied-configuration:{\"apiVersion\":\"dapr.io/v1alpha1\",\"kind\":\"Resiliency\",\"metadata\":{\"annotations\":{},\"name\":\"echo-app\",\"namespace\":\"apps\"},\"spec\":{\"policies\":{\"circuitBreakers\":{\"demoCB\":{\"interval\":\"0s\",\"maxRequests\":1,\"timeout\":\"10s\",\"trip\":\"consecutiveFailures \\u003e= 2\"}},\"retries\":{\"demoRetry\":{\"maxInterval\":\"10s\",\"maxRetries\":5,\"policy\":\"exponential\"}},\"timeouts\":{\"demoTimeout\":\"2s\"}},\"targets\":{\"apps\":{\"echo-app\":{\"timeout\":\"demoTimeout\"}}}}}\n] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[{Manager:Go-http-client Operation:Update APIVersion:dapr.io/v1alpha1 Time:2022-12-05 07:37:44 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:kubectl.kubernetes.io/last-applied-configuration\":{}}},\"f:spec\":{\".\":{},\"f:policies\":{\".\":{},\"f:circuitBreakers\":{\".\":{},\"f:demoCB\":{\".\":{},\"f:interval\":{},\"f:maxRequests\":{},\"f:timeout\":{},\"f:trip\":{}}},\"f:retries\":{\".\":{},\"f:demoRetry\":{\".\":{},\"f:maxInterval\":{},\"f:maxRetries\":{},\"f:policy\":{}}},\"f:timeouts\":{\".\":{},\"f:demoTimeout\":{}}},\"f:targets\":{\".\":{},\"f:apps\":{\".\":{},\"f:echo-app\":{\".\":{},\"f:timeout\":{}}}}}} Subresource:}]} Spec:{Policies:{Timeouts:map[demoTimeout:2s] Retries:map[demoRetry:{Policy:exponential Duration: MaxInterval:10s MaxRetries:5}] CircuitBreakers:map[demoCB:{MaxRequests:1 Interval:0s Timeout:10s Trip:consecutiveFailures >= 2}]} Targets:{Apps:map[echo-app:{Timeout:demoTimeout Retry: CircuitBreaker: CircuitBreakerCacheSize:0}] Actors:map[] Components:map[]}} Scopes:[]}" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4075321Z" level=info msg="Loading Resiliency configuration: node-app" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4076775Z" level=debug msg="Resiliency configuration (node-app): &{TypeMeta:{Kind:Resiliency APIVersion:dapr.io/v1alpha1} ObjectMeta:{Name:node-app GenerateName: Namespace:apps SelfLink: UID:bcb24479-aec2-4ecc-90a0-ddf753377bcb ResourceVersion:409743 Generation:1 CreationTimestamp:2022-12-05 07:37:44 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[kubectl.kubernetes.io/last-applied-configuration:{\"apiVersion\":\"dapr.io/v1alpha1\",\"kind\":\"Resiliency\",\"metadata\":{\"annotations\":{},\"name\":\"node-app\",\"namespace\":\"apps\"},\"spec\":{\"policies\":{\"circuitBreakers\":{\"demoCB\":{\"interval\":\"0s\",\"maxRequests\":1,\"timeout\":\"10s\",\"trip\":\"consecutiveFailures \\u003e= 2\"}},\"retries\":{\"demoRetry\":{\"maxInterval\":\"10s\",\"maxRetries\":5,\"policy\":\"exponential\"}},\"timeouts\":{\"demoTimeout\":\"1s\"}},\"targets\":{\"apps\":{\"node-app\":{\"timeout\":\"demoTimeout\"}}}}}\n] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[{Manager:Go-http-client Operation:Update APIVersion:dapr.io/v1alpha1 Time:2022-12-05 07:37:44 +0000 UTC FieldsType:FieldsV1 FieldsV1:{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:kubectl.kubernetes.io/last-applied-configuration\":{}}},\"f:spec\":{\".\":{},\"f:policies\":{\".\":{},\"f:circuitBreakers\":{\".\":{},\"f:demoCB\":{\".\":{},\"f:interval\":{},\"f:maxRequests\":{},\"f:timeout\":{},\"f:trip\":{}}},\"f:retries\":{\".\":{},\"f:demoRetry\":{\".\":{},\"f:maxInterval\":{},\"f:maxRetries\":{},\"f:policy\":{}}},\"f:timeouts\":{\".\":{},\"f:demoTimeout\":{}}},\"f:targets\":{\".\":{},\"f:apps\":{\".\":{},\"f:node-app\":{\".\":{},\"f:timeout\":{}}}}}} Subresource:}]} Spec:{Policies:{Timeouts:map[demoTimeout:1s] Retries:map[demoRetry:{Policy:exponential Duration: MaxInterval:10s MaxRetries:5}] CircuitBreakers:map[demoCB:{MaxRequests:1 Interval:0s Timeout:10s Trip:consecutiveFailures >= 2}]} Targets:{Apps:map[node-app:{Timeout:demoTimeout Retry: CircuitBreaker: CircuitBreakerCacheSize:0}] Actors:map[] Components:map[]}} Scopes:[]}" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4135402Z" level=info msg="Resiliency configuration loaded." app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4136488Z" level=info msg="kubernetes mode configured" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4136627Z" level=info msg="app id: echo-app" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4142229Z" level=info msg="mTLS enabled. creating sidecar authenticator" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4143079Z" level=info msg="trust anchors and cert chain extracted successfully" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.security type=log ver=1.9.5
time="2022-12-05T07:37:46.4143224Z" level=info msg="authenticator created" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4182303Z" level=info msg="Dapr trace sampler initialized: DaprTraceSampler(P=0.000100)" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4182725Z" level=info msg="Initialized name resolution to kubernetes" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.418361Z" level=debug msg="loading component. name: kubernetes, type: secretstores.kubernetes/v1" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4184565Z" level=info msg="loading components" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4187314Z" level=info msg="component loaded. name: kubernetes, type: secretstores.kubernetes/v1" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4190535Z" level=info msg="waiting for all outstanding components to be processed" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4190877Z" level=info msg="all outstanding components processed" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4191001Z" level=info msg="gRPC proxy enabled" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4191418Z" level=warning msg="Failed to listen on [::1]:50001 with error: listen tcp [::1]:50001: bind: cannot assign requested address" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.api type=log ver=1.9.5
time="2022-12-05T07:37:46.4192009Z" level=info msg="enabled gRPC tracing middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.api type=log ver=1.9.5
time="2022-12-05T07:37:46.4192278Z" level=info msg="enabled gRPC metrics middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.api type=log ver=1.9.5
time="2022-12-05T07:37:46.4192919Z" level=info msg="API gRPC server is running on port 50001" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4195512Z" level=info msg="enabled metrics http middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.http type=log ver=1.9.5
time="2022-12-05T07:37:46.4195864Z" level=info msg="enabled tracing http middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.http type=log ver=1.9.5
time="2022-12-05T07:37:46.4196157Z" level=warning msg="Failed to listen on [::1]:3500 with error: listen tcp [::1]:3500: bind: cannot assign requested address" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.http type=log ver=1.9.5
time="2022-12-05T07:37:46.4196787Z" level=info msg="enabled metrics http middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.http type=log ver=1.9.5
time="2022-12-05T07:37:46.4197074Z" level=info msg="enabled tracing http middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.http type=log ver=1.9.5
time="2022-12-05T07:37:46.4197319Z" level=info msg="http server is running on port 3500" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4197567Z" level=info msg="The request body size parameter is: 4" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4197822Z" level=info msg="enabled gRPC tracing middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.internal type=log ver=1.9.5
time="2022-12-05T07:37:46.4197935Z" level=info msg="enabled gRPC metrics middleware" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.internal type=log ver=1.9.5
time="2022-12-05T07:37:46.4198023Z" level=info msg="sending workload csr request to sentry" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.internal type=log ver=1.9.5
time="2022-12-05T07:37:46.4320402Z" level=info msg="certificate signed successfully" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.internal type=log ver=1.9.5
time="2022-12-05T07:37:46.4321691Z" level=info msg="internal gRPC server is running on port 50002" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4322032Z" level=info msg="application protocol: http. waiting on port 80.  This will block until the app is listening on that port." app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:46.4322499Z" level=info msg="starting workload cert expiry watcher. current cert expires on: 2022-12-06 07:52:46 +0000 UTC" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.grpc.internal type=log ver=1.9.5
time="2022-12-05T07:37:47.0903129Z" level=info msg="application discovered on port 80" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:47.1919148Z" level=info msg="application configuration loaded" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:47.1920244Z" level=info msg="actors: state store is not configured - this is okay for clients but services with hosted actors will fail to initialize!" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:47.1921072Z" level=info msg="actor runtime started. actor idle timeout: 1h0m0s. actor scan interval: 30s" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor type=log ver=1.9.5
time="2022-12-05T07:37:47.1921486Z" level=info msg="dapr initialized. Status: Running. Init Elapsed 778ms" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime type=log ver=1.9.5
time="2022-12-05T07:37:47.1921525Z" level=debug msg="try to connect to placement service: dns:///dapr-placement-server.tools.svc.cluster.local:50005" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2022-12-05T07:37:47.1989088Z" level=debug msg="established connection to placement service at dns:///dapr-placement-server.tools.svc.cluster.local:50005" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2022-12-05T07:37:47.1994599Z" level=debug msg="placement order received: lock" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2022-12-05T07:37:47.1995485Z" level=debug msg="placement order received: update" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2022-12-05T07:37:47.1995869Z" level=info msg="placement tables updated, version: 0" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
time="2022-12-05T07:37:47.1996078Z" level=debug msg="placement order received: unlock" app_id=echo-app instance=echo-app-6d649849b4-xlb6t scope=dapr.runtime.actor.internal.placement type=log ver=1.9.5
ItalyPaleAle commented 1 year ago

Oh, right. I forgot that we added one change, and you actually need to upgrade the control plane (the Dapr operator) too. I was hoping to make you upgrade the sidecars only!

In this case, to use the nightly tag for everything and not just daprd, use this Helm chart option:

global.tag=nightly-2022-12-04-linux-amd64
global.registry=ghcr.io/dapr
alecor191 commented 1 year ago

Thanks. I tried to set the two settings, but unfortunately I still get an error:

Failed to pull image "ghcr.io/dapr/dapr:nightly-2022-12-04-linux-arm64": rpc error: code = Unknown desc = Error response from daemon: manifest unknown
ItalyPaleAle commented 1 year ago

Do you need arm64 or amd64? Looks like we are not publishing nightly builds for arm64, and I just realized that, and it probably needs to be fixed :)

In the meanwhile, if you can try with amd64 that should work under emulation. I'll get a PR to add edge/nightly arm64 builds shortly.

ItalyPaleAle commented 1 year ago

Looks like an alternative is to use the edge tag which is multi-arch, also linux/arm64.

global.tag=edge

Although this is a "rolling" tag, so every so often (sometimes, every hour) can change

alecor191 commented 1 year ago

Thanks again. AMD64 worked.

So I installed the suggested nightly build and ran the scenario again. Unfortunately, I still have the same behavior as originally reported.

Timeout kicks-in as configured after 1 sec when using daprd-app-id header:

> time curl -I -X GET -H "dapr-app-id: echo-app" http://localhost:3500/delay?delay_milliseconds=5000

HTTP/1.1 500 Internal Server Error
Server: fasthttp
Date: Mon, 05 Dec 2022 19:53:01 GMT
Content-Type: application/json
Content-Length: 71
Traceparent: 00-a94c983df4747ee804600c91d99509f8-5b60fb4f0022a23d-01

real    0m1.014s
user    0m0.001s
sys     0m0.007s

Timeout configuration doesn't have any effect when using /v1.0/invoke/... approach:

> time curl -I -X GET http://localhost:3500/v1.0/invoke/echo-app.apps/method/delay?delay_milliseconds=5000

HTTP/1.1 200 OK
Server: Kestrel
Date: Mon, 05 Dec 2022 19:53:12 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 25
Traceparent: 00-9e4486863a297c82e11010390f0626df-2a3541ed48bdf983-01

real    0m5.010s
user    0m0.006s
sys     0m0.000s

And just to confirm, here the log from the daprd sidecar to confirm that I'm indeed using the nightly:

time="2022-12-05T19:51:57.3158141Z" level=info msg="starting Dapr Runtime -- version nightly-2022-12-04 -- commit 31b83bf81278cb3e511a92e8bd8c9b507ef85991" app_id=node-app instance=node-app-64b56cb85-27clc scope=dapr.runtime type=log ver=nightly-2022-12-04
ItalyPaleAle commented 1 year ago

Uh, I just noticed something now (apologies for not noticing it sooner):

http://localhost:3500/v1.0/invoke/echo-app.apps/method/delay?delay_milliseconds=5000

Please don't add .apps at the end of the app's name. Just echo-apps. We do not support dots in app ids anyways. That will also be consistent with what you're doing with the header (dapr-app-id: echo-app)

alecor191 commented 1 year ago

Argh, you're absolutely right. That's the reason why one worked and the other didn't. Adding .apps to the HTTP header results in the same issue (timeout not being honored). And timeout works when using /v1.0/... if I remove .apps.

But I can't remove .apps: that's the K8S namespace in which the app lives. We have apps in different namespaces and we need to provide it to Dapr to be able to route the request (I'm following How to: Service invocation across namespaces). In the example above I used 2 apps in the same namespace, therefore it worked either way.

Or am I missing something?

ItalyPaleAle commented 1 year ago

We have apps in different namespaces and we need to provide it to Dapr to be able to route the request

Oh yes, in this case, then you do need to add the dot. At least now we know the outcome is consistent.

In your policy, what if you change the target from "my-app" to "echo-app"?

  targets:
    apps:
      echo-app:
        timeout: demoTimeout
alecor191 commented 1 year ago

Sorry, that's just me being inconsistent in my GItHub comments.

To summarize the issue: If I configure an app with a timeout and then call that app from another app

It also seems reproducible if both apps live in the same namespace: as soon as I add the namespace name to the request, the configured timeout has no effect.

ItalyPaleAle commented 1 year ago

I think that makes sense, I'll need to investigate it a bit further.

@halspang do you have any idea?

ItalyPaleAle commented 1 year ago

Sorry @alecor191 one more question. What if you change your policy object to something like this:

  targets:
    apps:
      echo-app.apps:
        timeout: demoTimeout

Note the .apps added to the target

alecor191 commented 1 year ago

OK, so I did a few tests based on your suggestion. Here the results:

Using echo-app.apps in targets

Using both echo-app.apps and echo-app in targets

If I use a config like this one:

  targets:
    apps:
      echo-app.apps:
        timeout: demoTimeout
      echo-app:
        timeout: demoTimeout

Calling apps from a different namespace

In all tests above, the caller app was in the same K8S namespace as the target app. So I did an additional test: Call the target app from an app running in a different namespace. And here I was surprised that timeout did not kick in, even though I used the same targets configuration as above (didn't change it between tests).

IOW using echo-app.apps in targets had no effect when the app was called from a different K8S namespace. As if the timeout configuration didn't propagate to that namespace (what sidecar is canceling the request? The one on the caller or the one on the callee side?).

As I wasn't sure if changes to the Resiliency object should automatically propagate to all pods, I restarted all pods (callers and callees) to be sure.

halspang commented 1 year ago

what sidecar is canceling the request?

The calling sidecar controls the timeout so we can timeout even in cases where the other sidecar does not respond.

As I wasn't sure if changes to the Resiliency object should automatically propagate to all pods, I restarted all pods (callers and callees) to be sure.

Similar to components, the resiliency policy is loaded only at init time, so the sidecar has to be restarted when changes are made.

When calling from a different namespace, did you provide a resiliency config in that namespace as well? I'm not sure if we load config across namespaces.

Also, if you're using debug logging, you can see lines like this: r.log.Debugf("Found Endpoint Policy for %s: %+v", app, policyNames) when calling things resiliency. I'm curious to see if those are showing up in all cases. If not, that speaks to it just being an issue with addressing the policy correctly.

Note that policies are strict in matching, which is why you saw the issues with echo-app.apps and echo-app. We don't do any transforms in the policy and apply them to app-ids, in this case, exactly as they are given to us.

alecor191 commented 1 year ago

Thanks a lot @halspang, this explains a lot!

I did not apply the policy to all namespaces, only to the one where the app lives. If I apply the Resiliency policy also to all namespaces that call the app, then it works as expected.

After applying policies to all namespaces, I also see the Found Endpoint Policy... debug log in all calling sidecars. So also this looks good.

All my test scenarios work fine now. However, I would like to point something out:

In Istio, I'm used to define a timeout for a service in a single K8S object and then it "just works" regardless from where the service is called (same or different namespace, with or without namespace suffix). This makes it very convenient, esp. in a multi-team setup, as the owning team configures it once and all callers don't need to worry about it. In Dapr however, we need to:

I believe what would be a significant help is if we could define the policy once and don't have to worry about how and from where the service gets called.

Does that make sense?

yaron2 commented 1 year ago

cc @halspang

yaron2 commented 1 year ago

@alecor191 closing this now as this isn't an issue. thanks for starting the discussion, feel free to open another issue for cross namespace resiliency policies.