Zeebe binding - 1.10.0-rc.5 process variables not stored anymore #2527

Closed fabistb closed 1 year ago

fabistb commented 1 year ago


Process variables are variables which are returned from the controller that gets called from the binding, parsed as JSON and forwarded to Zeebe.

Expected Behavior

The Zeebe job worker input binding stores process variables in Zeebe as defined.

Actual Behavior

After deploying Dapr 1.10.0.-rc.5 to our cluster, we had the issue that process variables are not stored in Zeebe anymore.

Dapr Version


Cc: @akkie

yaron2 commented 1 year ago

@fabistb have you tried prior 1.10.x RCs, or is this a regression with 1.9?

Also, can you please include details for how to validate / reproduce this?

fabistb commented 1 year ago

@yaron2 , we haven't tried a previous 1.10.x RC. However reverting back to 1.9.6 solves it for us.

We are also just wondering what could cause this since the code hasn't changed a lot recently. https://github.com/dapr/components-contrib/blob/master/bindings/zeebe/jobworker/jobworker.go#L174

Regarding the validation / reproduction the best scenario is probably to check with @akkie.

berndverst commented 1 year ago

@yaron2 , we haven't tried a previous 1.10.x RC. However reverting back to 1.9.6 solves it for us.

We are also just wondering what could cause this since the code hasn't changed a lot recently. https://github.com/dapr/components-contrib/blob/master/bindings/zeebe/jobworker/jobworker.go#L174

Regarding the validation / reproduction the best scenario is probably to check with @akkie.

It seems the issue occurred by upgrading Zeebe Client from 8.0.3 to 8.1.6. Zeebe didn't document any breaking changes but clearly something changed across those minor versions. I have now downgraded this to 8.0.11 (released 4 days ago). Hopefully that will work.

As you know it's a best practice to keep dependencies up to date. It is very surprising that upgrading to a new minor version should have caused this issue, but it is the only explanation we can find.

Dapr/Dapr 1.10.0-rc.7 will contain this downgrade to the latest Zeebe 8.0.X (instead of 8.X.X).

akkie commented 1 year ago

I will have a look regarding the client issue. Do the E2E tests actually run in the pipeline?


berndverst commented 1 year ago

I will have a look regarding the client issue. Do the E2E tests actually run in the pipeline?


@akkie No these tests are never run. Not by any of us manually and definitely not by CI.

We will need your help to create a certification test for Zeebe instead. That's the place to run these kinds of integration tests.

Take a look at any of the folders within "tests/certification/bindings" for inspiration. You could probably copy one of those and delete a bunch of things, then migrate over the integration tests you wrote.

Once we have such tests (and every single component metadata property has been covered by tests) we can designate the component as stable. Only stable components receive hotfix support.

akkie commented 1 year ago

@berndverst OK, I will have a look

akkie commented 1 year ago

We have tested with the new RC8 version and it also doesn't work. I will run the Zeebe E2E tests and see if the issue also occurs there too. Give me some time and I can give more feedback

akkie commented 1 year ago

@berndverst @yaron2 I have tested the E2E tests with the actual master branch. All tests run as expected. We have debugged our application and we can see that our workers return all the variables back to the binding. Can it be the case that the body gets lost on the way from the controller to the binding. I know that with 2628 the functionality to propagate a response from the caller back into an input binding was implemented directly for the Zeebe binding. Maybe there is a regression in this code?

akkie commented 1 year ago

The problem occurs also with 1.10.0-rc.1. The last working version is the stable 1.9.6

akkie commented 1 year ago

We have provided an example with which you can reproduce the issue: https://github.com/PlanBGmbH/zeebe-dapr-example

Steps to reproduce:

There are files with the extension .http in the example project. These files can be executed with the RESt Client VSCode extension: https://marketplace.visualstudio.com/items?itemName=humao.rest-client

The deploy-process.http and the first request from the create-instance.http file needs to be executed

fabistb commented 1 year ago

Also tested it on my side. The deploy-process.http returned 400 with VSCode but worked with Rider.

Maybe you can rewrite it to a cUrl statement.

These are the screenshots from the error.

image image
berndverst commented 1 year ago

The master branch is running Zeebe 8.1.6. So it doesn't sound like the issue is caused by the Zeebe version at least as that is what we used in rc1 through rc6. The master branch also has the context propagation changes.

Since master branch works for you it cannot be these particular changes.

berndverst commented 1 year ago

We have provided an example with which you can reproduce the issue: https://github.com/PlanBGmbH/zeebe-dapr-example

Steps to reproduce:

  • Checkout Zeebe docker compose: git clone https://github.com/camunda/camunda-platform.git (if you run it on a ARM Mac, then the versions for Zeebe, Operate and Tasklist needs to be changed to 8.2.0-alpha4, connectors needs to be removed)
  • Run Zeebe: docker compose -f camunda-platform/docker-compose-core.yaml up
  • Clone the example repo: git clone https://github.com/PlanBGmbH/zeebe-dapr-example.git
  • Run the service: dotnet run --project "./zeebe-dapr-example/Zeebe.Worker/Zeebe.Worker.csproj"
  • Deploy the process and instantiate the process (see below)
  • Open operate and look at the process: http://localhost:8081 (demo, demo)

There are files with the extension .http in the example project. These files can be executed with the RESt Client VSCode extension: https://marketplace.visualstudio.com/items?itemName=humao.rest-client

The deploy-process.http and the first request from the create-instance.http file needs to be executed

On M1 Mac Elastic Search 7.17.9 must be used.

berndverst commented 1 year ago

@akkie on AMD64 Linux and ARM64 Mac I cannot run your sample. I get lots of errors of the form

info: Man.Dapr.Sidekick.DaprSidecarHost[0]
      2023/02/14 10:47:01 worker 'calculator' failed to open job stream: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: read tcp [::1]:54020->[::1]:26500: read: connection reset by peer"
info: Man.Dapr.Sidekick.DaprSidecarHost[0]
      2023/02/14 10:47:01 worker 'calculator' failed to open job stream: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: read tcp [::1]:54020->[::1]:26500: read: connection reset by peer"
info: Man.Dapr.Sidekick.DaprSidecarHost[0]
      2023/02/14 10:47:01 worker 'calculator' failed to open job stream: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: EOF"
info: Man.Dapr.Sidekick.DaprSidecarHost[0]
      2023/02/14 10:47:01 worker 'calculator' failed to open job stream: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: EOF"
info: Man.Dapr.Sidekick.DaprSidecarHost[0]
      2023/02/14 10:47:01 worker 'calculator' failed to open job stream: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: EOF"

The moment I run dotnet run

akkie commented 1 year ago

@berndverst this errors indicates that zeebe is not running

akkie commented 1 year ago

I run zeebe also on a M1 Mac and for me the configured Elastic Search version worked. What was the reason for you to change the Elastic Search version? Whas there an error?

berndverst commented 1 year ago

I run zeebe also on a M1 Mac and for me the configured Elastic Search version worked. What was the reason for you to change the Elastic Search version? Whas there an error?

The original version does seem to work. The issue is that Elastic Search requires running the container engine with root. I used podman. I fixed it with podman machine stop && podman machine edit --rootful=true && podman machine start

Now using 1.7.5 for Elastic Search and 8.2.0-alpha4 seems to work.

Going to try this again...

EDIT: I'm still seeing

berndverst commented 1 year ago

It seems this just doesn't work with podman. I reinstalled docker and it's fine. Still doesn't explain my issues on Linux though (running Docker there).

berndverst commented 1 year ago

We have provided an example with which you can reproduce the issue: https://github.com/PlanBGmbH/zeebe-dapr-example

Steps to reproduce:

  • Checkout Zeebe docker compose: git clone https://github.com/camunda/camunda-platform.git (if you run it on a ARM Mac, then the versions for Zeebe, Operate and Tasklist needs to be changed to 8.2.0-alpha4, connectors needs to be removed)
  • Run Zeebe: docker compose -f camunda-platform/docker-compose-core.yaml up
  • Clone the example repo: git clone https://github.com/PlanBGmbH/zeebe-dapr-example.git
  • Run the service: dotnet run --project "./zeebe-dapr-example/Zeebe.Worker/Zeebe.Worker.csproj"
  • Deploy the process and instantiate the process (see below)
  • Open operate and look at the process: http://localhost:8081 (demo, demo)

There are files with the extension .http in the example project. These files can be executed with the RESt Client VSCode extension: https://marketplace.visualstudio.com/items?itemName=humao.rest-client

The deploy-process.http and the first request from the create-instance.http file needs to be executed

To provide more clarity:

For M1 Mac the docker-compose-core.yaml should be

# While the Docker images themselves are supported for production usage,
# this docker-compose.yaml is designed to be used by developers to run
# an environment locally. It is not designed to be used in production.
# We recommend to use Kubernetes in production with our Helm Charts:
# https://docs.camunda.io/docs/self-managed/platform-deployment/kubernetes-helm/
# For local development, we recommend using KIND instead of `docker-compose`:
# https://docs.camunda.io/docs/self-managed/platform-deployment/helm-kubernetes/guides/local-kubernetes-cluster/

# This is a lightweight configuration with Zeebe, Operate, Tasklist, and Elasticsearch
# See docker-compose.yml for a configuration that also includes Optimize, Identity, and Keycloak.


  zeebe: # https://docs.camunda.io/docs/self-managed/platform-deployment/docker/#zeebe
    image: camunda/zeebe:${CAMUNDA_PLATFORM_VERSION:-8.2.0-alpha4}
    container_name: zeebe
      - "26500:26500"
      - "9600:9600"
    environment: # https://docs.camunda.io/docs/self-managed/zeebe-deployment/configuration/environment-variables/
      - ZEEBE_BROKER_EXPORTERS_ELASTICSEARCH_CLASSNAME=io.camunda.zeebe.exporter.ElasticsearchExporter
      # allow running with low disk space
      - "JAVA_TOOL_OPTIONS=-Xms512m -Xmx512m"
    restart: always
      - zeebe:/usr/local/zeebe/data
      - camunda-platform
      - elasticsearch

  operate: # https://docs.camunda.io/docs/self-managed/platform-deployment/docker/#operate
    image: camunda/operate:${CAMUNDA_PLATFORM_VERSION:-8.2.0-alpha4}
    container_name: operate
      - "8081:8080"
    environment: # https://docs.camunda.io/docs/self-managed/operate-deployment/configuration/
      - CAMUNDA_OPERATE_ELASTICSEARCH_URL=http://elasticsearch:9200
      - CAMUNDA_OPERATE_ZEEBEELASTICSEARCH_URL=http://elasticsearch:9200
      - camunda-platform
      - zeebe
      - elasticsearch

  tasklist: # https://docs.camunda.io/docs/self-managed/platform-deployment/docker/#tasklist
    image: camunda/tasklist:${CAMUNDA_PLATFORM_VERSION:-8.2.0-alpha4}
    container_name: tasklist
      - "8082:8080"
    environment: # https://docs.camunda.io/docs/self-managed/tasklist-deployment/configuration/
      - CAMUNDA_TASKLIST_ELASTICSEARCH_URL=http://elasticsearch:9200
      - CAMUNDA_TASKLIST_ZEEBEELASTICSEARCH_URL=http://elasticsearch:9200
      - camunda-platform
      - zeebe
      - elasticsearch

  elasticsearch: # https://hub.docker.com/_/elasticsearch
    image: docker.elastic.co/elasticsearch/elasticsearch:${ELASTIC_VERSION:-7.17.5}
    container_name: elasticsearch
      - "9200:9200"
      - "9300:9300"
      - bootstrap.memory_lock=true
      - discovery.type=single-node
      - xpack.security.enabled=false
      # allow running with low disk space
      - cluster.routing.allocation.disk.threshold_enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        soft: -1
        hard: -1
    restart: always
      test: [ "CMD-SHELL", "curl -f http://localhost:9200/_cat/health | grep -q green" ]
      interval: 30s
      timeout: 5s
      retries: 3
      - elastic:/usr/share/elasticsearch/data
      - camunda-platform

    image: docker.elastic.co/kibana/kibana:${ELASTIC_VERSION:-7.17.0}
    container_name: kibana
      - 5601:5601
      - kibana:/usr/share/kibana/data
      - camunda-platform
      - elasticsearch
      - kibana



To run the Dot Net app you need DotNet Core Runtime 6.0 installed.

Once the app is running deploy the Zeebe workflow (a one time operation) like so:

curl -F fileName="process.bpmn" -F fileContent=@process.bpmn http://localhost:5001/v1.0/invoke/zeebe-worker/method/command/deploy-process

Now use your tool of choice to make the following JSON POST request:

POST http://localhost:5001/v1.0/invoke/zeebe-worker/method/command/create-instance
Content-Type: application/json

  "bpmnProcessId": "zeebe-test",
  "variables": {
    "operator": "+",
    "firstOperand": "412",
    "secondOperand": "20"

In the operator dashboard at localhost:8081 (user demo, pass demo) you will see the zeebe-test workflow to have created a new run. Clicking on this shows the test succeeding or failing.

Here are my findings: With Dapr 1.9.6 this always succeeds. With Dapr 1.9.6 + upgraded Zeebe Client to 8.1.6 (latest) this also works. With Dapr 1.10-rc6 (Zeebe 8.1.6) this fails. With Dapr 1.10-rc8 (Zeebe 8.0.11) this fails. With Dapr 1.10-rc8 and Zeebe downgraded to 8.0.4 (same as original 1.9.6) this fails.

However, the failure is not that process variables are not stored.

Instead the failure is that the workflow suddenly cannot inject / process / find the result variable which isn't part of the request payload.

If you manually add result to the list of variables the workflow succeeds! (But the result is not updated)

It is not clear to me how this result variable is actually added by Zeebe (the server). It's also not clear what is different between 1.9 and 1.10 as there are no code changes to the Zeebe component at all between these versions.

The process variables do appear to be stored, but the failure is a different one.

berndverst commented 1 year ago

Using good old git bisect and building tons of Dapr versions manually, then going through the steps here we have now found the issue is caused by the following PR in the Dapr Runtime.


akkie commented 1 year ago

@berndverst @yaron2 I can confirm that the issue is fixed with Dapr 1.10.0-rc.9. Many thanks for the quick fix

I will have a look regarding the certification tests

fabistb commented 1 year ago

@berndverst , @yaron2 , I can also confirm that also on the cluster everything works. Thanks for the fix and the support!