[Workflow] GRPC connection to workflow runtime doesn't self-heal when app restarts

olitomlinson commented 6 months ago

cc @philliphoff

runtime 1.13.2 (not tried any other versions)

Expected Behavior

The grpc connection to the workflow runtime will reestablish after the app process (not dapr process) crashes and is restarted.

Actual Behavior

The grpc connection to the workflow runtime does not reestablish after the app process (not dapr process) crashes and is restarted.

Steps to Reproduce the Problem

Pull down my repro here https://github.com/olitomlinson/dapr-workflow-examples

run docker compose -f compose-1-instance-3-schedulers.yml build
run docker compose -f compose-1-instance-3-schedulers.yml up
stop the app container in compose - it will be named something like workflow-app-a-1
start the app container in compose
observe the logs in workflow-app-a-1 and you will see the following error repeating forever :

The gRPC server for Durable Task gRPC worker is unavailable. Will continue retrying.

Release Note

RELEASE NOTE:

cgillum commented 2 months ago

This may have been fixed already in 1.14 as part of pulling in some fixes in durabletask-go. @olitomlinson are you able to verify?

olitomlinson commented 2 months ago

This may have been fixed already in 1.14 as part of pulling in some fixes in durabletask-go. @olitomlinson are you able to verify?

Still an issue in 1.14.4

famarting commented 1 month ago

I find this confusing. For the go-sdk I made the client to infinitely retry the worker connection to dapr, and I think we should have that behavior on every SDK, I believe python already has it.

dapr / dotnet-sdk