camunda-community-hub / zeebe-client-node-js

Node.js client library for Zeebe Microservices Orchestration Engine
https://camunda-community-hub.github.io/zeebe-client-node-js/
Apache License 2.0
152 stars 38 forks source link

Upgrades in `@grpc/*` break library completely #290

Closed nikku closed 1 year ago

nikku commented 1 year ago

As documented here updating zebee-node and thereby semver minor bumping @grpc/* dependencies breaks the library completely. It causes an unconditional termination. No uncaughtException, no error.

Expected Behavior

Library works as expected.

Current Behavior

Unconditional termination. No uncaughtException, no error. Have not seen such behavior ever before.

Possible Solution

Pin dependencies to @grpc/* to last good known versions. Add integration tests / periodically check if zebee-node with latest dependencies still works.

Steps to Reproduce

git clone git@github.com:barmac/zeebe-tls-connection-test.git
cd zeebe-tls-connection-test
npm ci
npm run generate-certs
docker-compose --env-file .env.insecure up

Verify that the zebee-node-test does not print topology information (installs latest zebee-node).

npm run test:insecure

Verify that the zeebe-node test prints topology information (uses older, fixed versions of @grpc/*).

Context (Environment)

None.

Detailed Description

None.

Possible Implementation

None.

jwulf commented 1 year ago

When I run the reproducer according to the instructions, I see the following:

zeebe-tls-connection-test-zebee-node-test-1  |      executing zeebe-node with config:
zeebe-tls-connection-test-zebee-node-test-1  |
zeebe-tls-connection-test-zebee-node-test-1  |          address   = test.test.localhost:26500
zeebe-tls-connection-test-zebee-node-test-1  |          useTLS    = false
zeebe-tls-connection-test-zebee-node-test-1  |          certsPath = /usr/local/zeebe/rootCA.crt
zeebe-tls-connection-test-zebee-node-test-1  |          loglevel  = debug
zeebe-tls-connection-test-zebee-node-test-1  |
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.936Z | channel | (1) dns:test.test.localhost:26500 Channel constructed with options {
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.enable_retries": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.initial_reconnect_backoff_ms": 1000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.max_reconnect_backoff_ms": 10000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.min_reconnect_backoff_ms": 5000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.keepalive_time_ms": 180000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.keepalive_timeout_ms": 120000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.http2.min_time_between_pings_ms": 90000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.http2.min_ping_interval_without_data_ms": 90000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.keepalive_permit_without_calls": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.http2.max_pings_without_data": 0
zeebe-tls-connection-test-zebee-node-test-1  | }
zeebe-tls-connection-test-zebee-node-test-1  | pre-topology
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.941Z | channel | (1) dns:test.test.localhost:26500 createResolvingCall [0] method="/gateway_protocol.Gateway/Topology", deadline=Infinity
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.948Z | channel | (1) dns:test.test.localhost:26500 callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.955Z | subchannel | (2) 172.19.0.2:26500 Subchannel constructed with options {
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.enable_retries": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.initial_reconnect_backoff_ms": 1000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.max_reconnect_backoff_ms": 10000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.min_reconnect_backoff_ms": 5000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.keepalive_time_ms": 180000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.keepalive_timeout_ms": 120000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.http2.min_time_between_pings_ms": 90000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.http2.min_ping_interval_without_data_ms": 90000,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.keepalive_permit_without_calls": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "grpc.http2.max_pings_without_data": 0
zeebe-tls-connection-test-zebee-node-test-1  | }
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.957Z | channel | (1) dns:test.test.localhost:26500 callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.958Z | subchannel | (2) 172.19.0.2:26500 IDLE -> CONNECTING
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.965Z | channel | (1) dns:test.test.localhost:26500 createRetryingCall [1] method="/gateway_protocol.Gateway/Topology"
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.966Z | channel | (1) dns:test.test.localhost:26500 createLoadBalancingCall [2] method="/gateway_protocol.Gateway/Topology"
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.967Z | channel | (1) dns:test.test.localhost:26500 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.990Z | subchannel | (2) 172.19.0.2:26500 CONNECTING -> READY
zeebe-tls-connection-test-zebee-node-test-1  | D 2023-01-25T21:52:02.991Z | channel | (1) dns:test.test.localhost:26500 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
zeebe-tls-connection-test-zebee-node-test-1  | post-topology
zeebe-tls-connection-test-zebee-node-test-1  | {
zeebe-tls-connection-test-zebee-node-test-1  |   "brokers": [
zeebe-tls-connection-test-zebee-node-test-1  |     {
zeebe-tls-connection-test-zebee-node-test-1  |       "partitions": [
zeebe-tls-connection-test-zebee-node-test-1  |         {
zeebe-tls-connection-test-zebee-node-test-1  |           "partitionId": 1,
zeebe-tls-connection-test-zebee-node-test-1  |           "role": "LEADER",
zeebe-tls-connection-test-zebee-node-test-1  |           "health": "HEALTHY"
zeebe-tls-connection-test-zebee-node-test-1  |         }
zeebe-tls-connection-test-zebee-node-test-1  |       ],
zeebe-tls-connection-test-zebee-node-test-1  |       "nodeId": 0,
zeebe-tls-connection-test-zebee-node-test-1  |       "host": "0.0.0.0",
zeebe-tls-connection-test-zebee-node-test-1  |       "port": 26501,
zeebe-tls-connection-test-zebee-node-test-1  |       "version": "8.1.0"
zeebe-tls-connection-test-zebee-node-test-1  |     }
zeebe-tls-connection-test-zebee-node-test-1  |   ],
zeebe-tls-connection-test-zebee-node-test-1  |   "clusterSize": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "partitionsCount": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "replicationFactor": 1,
zeebe-tls-connection-test-zebee-node-test-1  |   "gatewayVersion": "8.1.0"
zeebe-tls-connection-test-zebee-node-test-1  | }
zeebe-tls-connection-test-zebee-node-test-1 exited with code 0
nikku commented 1 year ago

I can reproduce that it is working now :tada:.

Seems like the issue may have been fixed upstream as the test bed always always installs a latest version of zeebe-node (and transitively latest grpc/* updates).

nikku commented 1 year ago

Searching for a reference in the grpc issue tracker I found the originating issue: https://github.com/grpc/grpc-node/issues/2318#issuecomment-1403124466.

It looks like the authors try to fix that issue on and of from with the different patches they've released in the last weeks, with the latest one being released last night: https://github.com/grpc/grpc-node/pull/2337.

Due to the severity of this issue should we still bump the dependendencies explicitly? Or pin them to a working version (i.e. 1.6.x)? It would help downstream integrators for the time being that would otherwise need to do that across different layers of their dependencies, cf. https://github.com/camunda/camunda-modeler/pull/3409.

nikku commented 1 year ago

Closing this issue.