Open owmo-dev opened 3 weeks ago
This thread may be helpful in diagnosing the problem (it sounds similar, scheduled but not started)
https://community.temporal.io/t/activity-scheduled-but-not-started-need-help/4313/5
I noticed the following error in the script worker
2024-08-24T02:13:20.657120Z WARN temporal_sdk_core::worker::activities: Network error while completing activity error=Status { code: Cancelled, message: "operation was canceled", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) }
This thread may offer useful advice to investigate:
https://community.temporal.io/t/activity-timeout-and-temporal-server-connectivity-issue/8869/2
Seems like the heartbeat fixed that issue, but after updating I now have a new issue to content with...
2024-08-27T17:42:04.546Z [INFO] Worker state changed { sdkComponent: 'worker', taskQueue: 'render', state: 'FAILED' } RangeError: "length" is outside of buffer bounds at Buffer.proto.utf8Write (node:internal/buffer:1066:13) at Op.writeStringBuffer [as fn] (/Users/owmo/dev/combomash-orchestrator/node_modules/protobufjs/src/writer_buffer.js:61:13) at BufferWriter.finish (/Users/owmo/dev/combomash-orchestrator/node_modules/protobufjs/src/writer.js:453:14) at Worker.handleActivation (/Users/owmo/dev/combomash-orchestrator/node_modules/@temporalio/worker/src/worker.ts:1164:10) { code: 'ERR_BUFFER_OUT_OF_BOUNDS' }
This runs when running a sequence. My best guess is that it's too much information for Temporal's memory limit...
It looks like that is a Node bug, which is said to be fixed in today's release 22.8.0
Down-graded to node@20 and everything is working. I'll install the update tomorrow and verify it's all working.
Down-grading to node@20 until the update goes live.
Installed node@20.17.0 in the package for now to ensure consistency of operation
Every now and then jobs seem to get stuck on executing scripts. Re-starting the workers doesn't work. Doing a "Reset" command in temporal will re-submit the jobs and suddenly the post tasks will execute without issue. I suspect there's something wrong with how I'm scheduling the work, but will need to investigate and try to find a re-produceable scenario (difficult, as it typically requires a lot of frames to render to occur).