elsa-workflows / elsa-core

A .NET workflows library
https://v3.elsaworkflows.io/
MIT License
5.88k stars 1.06k forks source link

[PERF] Journal Data is logging to redundant information in WorkflowExecutionLogRecords #5262

Closed jdevillard closed 1 week ago

jdevillard commented 3 weeks ago

Performance Improvement Request

Performance Issue Overview

Is your performance issue related to a specific functionality? Please describe. This performance issue is related to the Storage Capacity of the LogRecord for each activity. The database can growth quickly with some data that are not used or redundant with other data.

Proposed Enhancement

Remove redundant Data of the SerializedPayload of the WorkflowExecutionLogRecords table

The SerializedPayload column store the Payload Data which is composed of all the Journal Data of the Activity. This Journal Data can be composed of :

Alternative Solutions

Describe alternatives you've considered Have you identified any potential fixes or tweaks yourself? Please share your findings, including why they might not have been a perfect or complete solution. This helps us understand the issue better and consider all possible solutions.

Affected Use Cases

Identify affected use cases The use of the Elsa.FlowSendHttpRequest add all the parsed content of the Http call in the journal. This can be very consequent.

But Any activity that create some output will add data to the journal.

Impact of Enhancement

Explain the potential impact avoid to much storage, enhance performance due to less data in network call to the persistence store.

Benchmarks and Metrics

Provide any relevant benchmarks or metrics

image

For example, in my Http Sample, the cost of a record could go from 683 bytes to 505 923 bytes . And this is visible for all activities that create some output.

Additional Context

Further though

It could be useful to also allow selection to what can be stored using the journal. (output are already chosen with the Persistence Log Record Module and stored in the SerializedActivityState

glime-ai[bot] commented 3 weeks ago

Based on the context provided, it seems you are looking to improve the performance of Elsa Workflows by addressing the issue of redundant data storage, particularly in the WorkflowExecutionLogRecords table. The SerializedPayload column in this table stores a large amount of data, which can lead to rapid database growth and potentially impact performance due to the size of the data being stored and transferred.

To address this issue, you are proposing to remove redundant data from the SerializedPayload of the WorkflowExecutionLogRecords table. This would involve not storing the entire journal data for each activity, which can include outcomes, exceptions, and other journalization data.

As an alternative, you've considered allowing users to configure what gets stored in the journal, particularly for activities that create outputs. This could be managed through the Persistence Log Record Module, where users can select which outputs to persist.

The impact of this enhancement would be significant storage savings and improved performance due to reduced data transfer to the persistence store. You've provided a benchmark showing that the size of a record could be reduced significantly, from 683 bytes to 505,923 bytes, in the case of an HTTP sample.

To proceed with this enhancement, you would need to evaluate the current implementation and identify where the redundancies occur. Then, you would need to modify the logging mechanism to either remove the redundant data or provide configuration options for users to select what data should be stored.

This change would affect any use case where activities generate outputs that are currently being stored in the SerializedPayload. Activities like Elsa.FlowSendHttpRequest, which add parsed content of HTTP calls to the journal, would be particularly affected.

In summary, the proposed enhancement aims to optimize data storage and improve performance by reducing the amount of redundant data stored in the WorkflowExecutionLogRecords table. This would involve changes to the logging mechanism and potentially adding user-configurable options for data persistence.

This comment was generated by Glime.

jdevillard commented 1 week ago

closed as done with #5266