Open dhiaayachi opened 2 months ago
Thank you for your feature request.
We understand that the current periodic shard info persistence can lead to task reprocessing when load is high.
While this feature is not yet available, you can explore the following workarounds:
temporal.task-processing-shard-info-persistence-interval
setting to a lower value, e.g., 1 minute, to reduce the amount of task processing progress lost in case of a shard reload.We are always striving to improve Temporal, and we will consider your feedback as we develop new features.
Thank you for reporting this issue! It's important for us to optimize task processing performance, and your suggestion to make shard info persistence based on the number of processed tasks instead of time is a valuable one.
While this feature isn't currently available, we can explore alternative solutions. You can adjust the temporal.taskQueuePersistenceInterval
parameter in the Temporal server configuration to increase the frequency of shard info persistence. This will reduce the amount of data lost during shard reload. Additionally, you can try increasing the temporal.taskQueueProcessorParallelism
parameter to enhance the task processing rate.
We'll keep this suggestion in mind for future development. Please let us know if you have any other questions.
Thank you for reporting this issue.
You are right, the current periodic shard info persistence can lead to significant reprocessing when the load on the cluster is high.
While a task-based persistence mechanism is not currently available, you can explore these workarounds:
temporal.server.shardInfoPersistenceInterval
configuration option to a lower value (e.g., 1 minute) to reduce the amount of processing loss.Please let us know if you have any other questions.
Thank you for your feature request! We understand the importance of minimizing reprocessing during shard reloads, especially under high load conditions.
Currently, Temporal does not offer task-based checkpointing for shard information. You can work around this limitation by reducing the ShardInfoPersistenceInterval
to a smaller value. However, this could increase the frequency of shard information persistence, potentially impacting performance.
We appreciate your suggestion and will consider it for future enhancements.
Is your feature request related to a problem? Please describe. Right now shard info persistence is periodic (by default every 5mins), and history task processing progress is part of shard info (shardInfo.QueueStates).
When load is high on the cluster losing 5mins of task processing progress means lots of reprocessing after shard reload, since we have no idea they are duplicated tasks. This duplication also make task processing rate limit harder.
If instead of time based, we can make the condition based on # of task processed, we'll be able to check point more often when load is high and reduce re-processing.
Describe the solution you'd like
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.