I have a workflow that tries to insert 1 million rows into a Postgres db in batches. I tried it with batches of 10, 100, 1000 and 10000. They are all failing with errors that Workflow history size/count exceeds the limit. I understand this is happening for batches of 10 and even may 100 but, it's still failing for batches of 1000 and 10000.
My hunch is this is because we're storing the 10000 rows I created in JS Object and passed to SQL query as params in History and that's hitting some limit (to be triaged) (and not necessarily the 50000 limit).
Below are the errors, you can see "wf-history-event-id" is way less than 50,000. It was above 50k for batches of 10.
{"level":"error","ts":"2024-08-30T10:01:32.637Z","msg":"Fail to process task","shard-id":3,"address":"172.18.0.2:7234","component":"transfer-queue-processor","wf-namespace-id":"a68a48fe-7837-4ac4-b017-555456d1a6d3","wf-id":"XXCRLR1J","wf-run-id":"48299733-1a29-492a-a00d-41713c4fc096","queue-task-id":47193722,"queue-task-visibility-timestamp":"2024-08-30T10:01:32.586Z","queue-task-type":"TransferActivityTask","queue-task":{"NamespaceID":"a68a48fe-7837-4ac4-b017-555456d1a6d3","WorkflowID":"XXCRLR1J","RunID":"48299733-1a29-492a-a00d-41713c4fc096","VisibilityTimestamp":"2024-08-30T10:01:32.586925008Z","TaskID":47193722,"TaskQueue":"appsmith-queue","ScheduledEventID":2633,"Version":0},"wf-history-event-id":2633,"error":"Workflow history size / count exceeds limit.","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/temporal/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:211"}
-------
{"level":"error","ts":"2024-08-30T10:22:22.644Z","msg":"Fail to process task","shard-id":3,"address":"172.18.0.2:7234","component":"transfer-queue-processor","wf-namespace-id":"a68a48fe-7837-4ac4-b017-555456d1a6d3","wf-id":"06OILRGQ","wf-run-id":"96924a0b-2af4-4aa6-ab30-bf02a68af379","queue-task-id":47213691,"queue-task-visibility-timestamp":"2024-08-30T10:22:22.627Z","queue-task-type":"TransferActivityTask","queue-task":{"NamespaceID":"a68a48fe-7837-4ac4-b017-555456d1a6d3","WorkflowID":"06OILRGQ","RunID":"96924a0b-2af4-4aa6-ab30-bf02a68af379","VisibilityTimestamp":"2024-08-30T10:22:22.627576211Z","TaskID":47213691,"TaskQueue":"appsmith-queue","ScheduledEventID":7369,"Version":0},"wf-history-event-id":7369,"error":"Workflow history size / count exceeds limit.","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/temporal/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:211"}
-------
{"level":"error","ts":"2024-08-30T10:25:43.632Z","msg":"Fail to process task","shard-id":4,"address":"172.18.0.2:7234","component":"transfer-queue-processor","wf-namespace-id":"a68a48fe-7837-4ac4-b017-555456d1a6d3","wf-id":"Q11QEDDV","wf-run-id":"695c5cb8-518f-42bd-a9b5-63a9e3004765","queue-task-id":49286138,"queue-task-visibility-timestamp":"2024-08-30T10:25:43.596Z","queue-task-type":"TransferActivityTask","queue-task":{"NamespaceID":"a68a48fe-7837-4ac4-b017-555456d1a6d3","WorkflowID":"Q11QEDDV","RunID":"695c5cb8-518f-42bd-a9b5-63a9e3004765","VisibilityTimestamp":"2024-08-30T10:25:43.596992221Z","TaskID":49286138,"TaskQueue":"appsmith-queue","ScheduledEventID":759,"Version":0},"wf-history-event-id":759,"error":"Workflow history size / count exceeds limit.","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/temporal/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:211"}
I have a workflow that tries to insert 1 million rows into a Postgres db in batches. I tried it with batches of 10, 100, 1000 and 10000. They are all failing with errors that Workflow history size/count exceeds the limit. I understand this is happening for batches of 10 and even may 100 but, it's still failing for batches of 1000 and 10000.
My hunch is this is because we're storing the 10000 rows I created in JS Object and passed to SQL query as params in History and that's hitting some limit (to be triaged) (and not necessarily the 50000 limit).
Below are the errors, you can see "wf-history-event-id" is way less than 50,000. It was above 50k for batches of 10.