appsmithorg / appsmith

Platform to build admin panels, internal tools, and dashboards. Integrates with 25+ databases and any API.
https://www.appsmith.com
Apache License 2.0
34.67k stars 3.75k forks source link

[Bug] Large Row Inserts failing with Workflow history size / count exceeds limit error #36011

Open infinitetrooper opened 2 months ago

infinitetrooper commented 2 months ago

I have a workflow that tries to insert 1 million rows into a Postgres db in batches. I tried it with batches of 10, 100, 1000 and 10000. They are all failing with errors that Workflow history size/count exceeds the limit. I understand this is happening for batches of 10 and even may 100 but, it's still failing for batches of 1000 and 10000.

My hunch is this is because we're storing the 10000 rows I created in JS Object and passed to SQL query as params in History and that's hitting some limit (to be triaged) (and not necessarily the 50000 limit).

Below are the errors, you can see "wf-history-event-id" is way less than 50,000. It was above 50k for batches of 10.

{"level":"error","ts":"2024-08-30T10:01:32.637Z","msg":"Fail to process task","shard-id":3,"address":"172.18.0.2:7234","component":"transfer-queue-processor","wf-namespace-id":"a68a48fe-7837-4ac4-b017-555456d1a6d3","wf-id":"XXCRLR1J","wf-run-id":"48299733-1a29-492a-a00d-41713c4fc096","queue-task-id":47193722,"queue-task-visibility-timestamp":"2024-08-30T10:01:32.586Z","queue-task-type":"TransferActivityTask","queue-task":{"NamespaceID":"a68a48fe-7837-4ac4-b017-555456d1a6d3","WorkflowID":"XXCRLR1J","RunID":"48299733-1a29-492a-a00d-41713c4fc096","VisibilityTimestamp":"2024-08-30T10:01:32.586925008Z","TaskID":47193722,"TaskQueue":"appsmith-queue","ScheduledEventID":2633,"Version":0},"wf-history-event-id":2633,"error":"Workflow history size / count exceeds limit.","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/temporal/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:211"}

-------

{"level":"error","ts":"2024-08-30T10:22:22.644Z","msg":"Fail to process task","shard-id":3,"address":"172.18.0.2:7234","component":"transfer-queue-processor","wf-namespace-id":"a68a48fe-7837-4ac4-b017-555456d1a6d3","wf-id":"06OILRGQ","wf-run-id":"96924a0b-2af4-4aa6-ab30-bf02a68af379","queue-task-id":47213691,"queue-task-visibility-timestamp":"2024-08-30T10:22:22.627Z","queue-task-type":"TransferActivityTask","queue-task":{"NamespaceID":"a68a48fe-7837-4ac4-b017-555456d1a6d3","WorkflowID":"06OILRGQ","RunID":"96924a0b-2af4-4aa6-ab30-bf02a68af379","VisibilityTimestamp":"2024-08-30T10:22:22.627576211Z","TaskID":47213691,"TaskQueue":"appsmith-queue","ScheduledEventID":7369,"Version":0},"wf-history-event-id":7369,"error":"Workflow history size / count exceeds limit.","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/temporal/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:211"}

-------

{"level":"error","ts":"2024-08-30T10:25:43.632Z","msg":"Fail to process task","shard-id":4,"address":"172.18.0.2:7234","component":"transfer-queue-processor","wf-namespace-id":"a68a48fe-7837-4ac4-b017-555456d1a6d3","wf-id":"Q11QEDDV","wf-run-id":"695c5cb8-518f-42bd-a9b5-63a9e3004765","queue-task-id":49286138,"queue-task-visibility-timestamp":"2024-08-30T10:25:43.596Z","queue-task-type":"TransferActivityTask","queue-task":{"NamespaceID":"a68a48fe-7837-4ac4-b017-555456d1a6d3","WorkflowID":"Q11QEDDV","RunID":"695c5cb8-518f-42bd-a9b5-63a9e3004765","VisibilityTimestamp":"2024-08-30T10:25:43.596992221Z","TaskID":49286138,"TaskQueue":"appsmith-queue","ScheduledEventID":759,"Version":0},"wf-history-event-id":759,"error":"Workflow history size / count exceeds limit.","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/runner/work/temporal/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/runner/work/temporal/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/runner/work/temporal/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/temporal/temporal/common/tasks/fifo_scheduler.go:211"}
nsarupr commented 2 months ago

Hey team! Please add your planning poker estimate with Zenhub @ayushpahwa @srix

nsarupr commented 2 months ago

This needs to be triaged and be re-evaluated.