combomash / orchestrator

Render workflows / activities for @combomash/engine via temporal.io
ISC License
0 stars 0 forks source link

bug stuck script #25

Open owmo-dev opened 3 weeks ago

owmo-dev commented 3 weeks ago

Every now and then jobs seem to get stuck on executing scripts. Re-starting the workers doesn't work. Doing a "Reset" command in temporal will re-submit the jobs and suddenly the post tasks will execute without issue. I suspect there's something wrong with how I'm scheduling the work, but will need to investigate and try to find a re-produceable scenario (difficult, as it typically requires a lot of frames to render to occur).

owmo-dev commented 3 weeks ago

This thread may be helpful in diagnosing the problem (it sounds similar, scheduled but not started)

https://community.temporal.io/t/activity-scheduled-but-not-started-need-help/4313/5

owmo-dev commented 3 weeks ago

I noticed the following error in the script worker

2024-08-24T02:13:20.657120Z WARN temporal_sdk_core::worker::activities: Network error while completing activity error=Status { code: Cancelled, message: "operation was canceled", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) }

This thread may offer useful advice to investigate:

https://community.temporal.io/t/activity-timeout-and-temporal-server-connectivity-issue/8869/2

owmo-dev commented 2 weeks ago

Seems like the heartbeat fixed that issue, but after updating I now have a new issue to content with...

2024-08-27T17:42:04.546Z [INFO] Worker state changed { sdkComponent: 'worker', taskQueue: 'render', state: 'FAILED' } RangeError: "length" is outside of buffer bounds at Buffer.proto.utf8Write (node:internal/buffer:1066:13) at Op.writeStringBuffer [as fn] (/Users/owmo/dev/combomash-orchestrator/node_modules/protobufjs/src/writer_buffer.js:61:13) at BufferWriter.finish (/Users/owmo/dev/combomash-orchestrator/node_modules/protobufjs/src/writer.js:453:14) at Worker.handleActivation (/Users/owmo/dev/combomash-orchestrator/node_modules/@temporalio/worker/src/worker.ts:1164:10) { code: 'ERR_BUFFER_OUT_OF_BOUNDS' }

This runs when running a sequence. My best guess is that it's too much information for Temporal's memory limit...

owmo-dev commented 2 weeks ago

It looks like that is a Node bug, which is said to be fixed in today's release 22.8.0

https://github.com/nodejs/node/issues/54518

https://github.com/nodejs/node/pull/54524

owmo-dev commented 2 weeks ago

Down-graded to node@20 and everything is working. I'll install the update tomorrow and verify it's all working.

owmo-dev commented 2 weeks ago

Down-grading to node@20 until the update goes live.

https://apple.stackexchange.com/questions/171530/how-do-i-downgrade-node-or-install-a-specific-previous-version-using-homebrew

owmo-dev commented 2 weeks ago

Installed node@20.17.0 in the package for now to ensure consistency of operation