Open dhiaayachi opened 2 months ago
Is your feature request related to a problem? Please describe.
Currently, resetting a workflow with a large number of reapplied signals and updates after the reset point can fail due to the persistence layer's 4MB event batch size limit. This issue occurs because the reapplied events are grouped into a single batch, potentially exceeding the limit.
Describe the solution you'd like
To address this, we propose two possible solutions:
Approach 1: Automatic Batch Creation
Approach 2: Reset-Specific Batching
Describe alternatives you've considered
Additional context
This issue only affects resets, where events from multiple batches in the original workflow can be reapplied. The event reapply logic during conflict resolution, triggered by replication tasks, operates on a single event batch, so this limit is not encountered in that scenario.
References:
Benefits of Implementing This Feature:
Thank you for reporting this issue. This is a known issue with Temporal's DefaultTransactionSizeLimit
which is 4 MB. Unfortunately, this limit applies to all events persisted to the history, including those events which are reapplied during a reset. As a result, workflows with reapplied events exceeding the limit cannot be reset.
To work around this issue, you can consider decreasing the size of your reapplied events, or breaking your workflows into smaller units that would keep the events persisted during a reset under 4 MB.
We appreciate you raising this issue, and we will consider solutions like implementing automatic batch splitting in future versions of Temporal. You can track the progress on this issue in our GitHub repository.
Thank you for reporting this issue. The 4MB limit for the event batch size during reset is a known limitation of the Temporal service. You can find more information about the default DefaultTransactionSizeLimit
in the Temporal documentation.
There are a few approaches that might work to mitigate the issue while we explore a solution:
DefaultTransactionSizeLimit
: You can increase the DefaultTransactionSizeLimit
by adjusting the limit.defaultTransactionSizeLimit
dynamic config variable in your Temporal service. It's important to note that increasing this limit might have performance implications.We appreciate your understanding and will work to find a more comprehensive solution.
Thank you for reporting this issue.
It seems like you are experiencing an issue with reset
functionality when the size of reapplied events exceeds the DefaultTransactionSizeLimit
(4MB) of the persistence layer.
The DefaultTransactionSizeLimit
is a hard limit that ensures the persistence layer can handle events in a batch.
You are correct that the issue only applies to reset
, not conflict resolution.
Here are a few things you can try:
reset
process. This might involve optimizing the data being sent in the events.DefaultTransactionSizeLimit
: You can increase the DefaultTransactionSizeLimit
by adjusting the configuration of the Temporal server. Here are some relevant resources:
Let me know if you have any further questions.
Thank you for reporting this issue.
As you mentioned, this issue only applies to reset, and is not related to the replication task that handles conflict resolution.
The DefaultTransactionSizeLimit
for persistence is indeed 4 MB, and you are experiencing this limit during reset because all the reapplied events are grouped into one batch.
There are a few potential workarounds you could consider:
DefaultTransactionSizeLimit
: This is not ideal as it could impact performance, but it's a quick fix. Please let us know if you have any other questions or if you need further assistance.
Thank you for reporting this issue.
You are correct that the current implementation of ResetWorkflowExecution
does not handle batches larger than 4 MB.
The documentation regarding the "Event batch size" limit can be found here: https://docs.temporal.io/self-hosted-guide/defaults.
We are actively working on a solution to address this issue. We will update the documentation with the details of the solution once it is available.
In the meantime, you may consider using smaller batches for your reset operations. This can be achieved by adjusting your workflow logic to generate smaller event batches.
Is your feature request related to a problem? Please describe. When doing reset, signals and updates after the reset point will be reapplied (cherry-picked) to the new run. However all those reapplied events are grouped into one batch. Our persistence layer has a validation that basically says each event batch can't be exceeded 4MB size limit (each batch is a separate call to persistence). This means if the size of reapplied events is larger than 4MB, the reset can't be done.
This issue from my understanding only applies to reset, where events from more than one event batches in the base workflow can be picked. The events reapply logic during conflict resolution is triggered by the replication task of a single batch of event, so we won't run into the situation.
Describe the solution you'd like Reset with more than 4MB reapplied events should be supported.
Approach 1:
Approach 2:
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.