camunda-community-hub / zeebe-client-node-js

Node.js client library for Zeebe Microservices Orchestration Engine
https://camunda-community-hub.github.io/zeebe-client-node-js/
Apache License 2.0
152 stars 38 forks source link

onConnectionError is not called for workers #184

Closed ellik95 closed 4 years ago

ellik95 commented 4 years ago

Hello!

We are facing some errors on the workers side, but the onConnectionError() never gets called. For example:

15:56:47.739 | zeebe |  [http] INFO: Grpc Stream Error: 1 CANCELLED: Received http2 header with status: 502
15:56:52.541 | zeebe |  [eval] INFO: The gateway returned HTTP Error 503 (Bad Gateway). This can be a transient failure while a Kubernetes node in Camunda Cloud is being pre-empted.
07:51:16.276 | zeebe |  [http] INFO: Stalled on Grpc Error
07:51:16.281 | zeebe |  [http] INFO: Grpc Stream Error: 14 UNAVAILABLE: TCP Read failed
09:49:54.840 | zeebe |  [eval] INFO: Stalled on Grpc Error
09:49:54.840 | zeebe |  [eval] INFO: Grpc Stream Error: 14 UNAVAILABLE: failed to connect to all addresses
10:05:03.134 | zeebe |  [http] INFO: Stalled on Grpc Error
10:05:03.134 | zeebe |  [http] INFO: Grpc Stream Error: 14 UNAVAILABLE: GOAWAY received

Also when we create a workflow instance the workers continue to work despite those errors, so I suppose these errors shouldn't affect us right? We use latest version of library (0.24).

ellik95 commented 4 years ago

The onReady() doesn't get called as well if this helps.

Bec-k commented 4 years ago
  worker.on('ready', () => {
    console.log('worker is ready!');
  });

is working for me

ellik95 commented 4 years ago

Thank you I was using

const zbWorker = zbc.createWorker({
    taskType: 'demo-service',
    taskHandler: handler,
    onReady: () => console.log(`Worker connected!`),
    onConnectionError: () => console.log(`Worker disconnected!`)
})

instead. Now the onConnectionError() gets called as well, but are these errors normal? We saw that workers continue to work so we don't need to reconnect them when there is an error right?

Bec-k commented 4 years ago

I think there should be some re-connect logic built-in. The system itself is built as fault tolerant, i hope that this should be already handled... :D

ellik95 commented 4 years ago

We hope as well thank you :)