elsa-workflows / elsa-studio

A modular, extensible dashboard application framework
MIT License
173 stars 76 forks source link

Enhancement Proposal: Workflow Execution Statistics #187

Open af-git-64 opened 4 months ago

af-git-64 commented 4 months ago

Overview To further enhance the functionality of Elsa.Studio, we propose implementing a service that calculates various statistics for individual workflow definitions or multiple workflow definitions. At Nuvotex GmbH, we actively utilize this functionality to assess the health of our workflows.

Problem Statement Currently, assessing the health of a specific workflow definition can be challenging, especially when it is executed frequently. The existing method of relying solely on the instances page is time-consuming and inefficient. Additionally, as the number of executions grows, the instances page becomes cluttered, making it difficult to identify relevant information. Furthermore, determining whether the number of faults is a cause for concern remains problematic. For instance, having 55 faults might be acceptable if the overall success rate remains above 96%. However, this assessment is currently cumbersome.

Proposed Solution We propose creating a service that analyzes all workflow instances within a specified time frame (e.g., the last X days) for each defined workflow. The service would aggregate relevant data and provide insights into the health and performance of workflow definitions. Specifically, the following information would be stored for each workflow definition:

  1. WorkflowDefinitionSummary: An overview of key details related to the workflow definition.
  2. Lists of WorkflowInstanceSummary:
    • Finished: Instances that completed successfully.
    • Faulted: Instances that encountered errors.
    • Executing: Instances currently in progress.
    • And other relevant categories.

By implementing this feature, users would gain immediate visibility into their workflow definitions’ performance, allowing for better decision-making and proactive maintenance.

To calculate success rates and fault percentages, consider the following formulas:

Success rate: if (TotalRuns == 0) { return 1; } return (double)(TotalRuns - TotalFaults) / TotalRuns;

Fault rate: if (TotalRuns == 0) { return 0; } return (double)TotalFaults / TotalRuns; etc.

Subsequently, these statistics can be incorporated into the workflow definition row by introducing a new column labeled ‘Health’. image The green chip displays the success rate as a percentage over the past X days. By default, we set this value to 3, with a maximum allowable range of 60 days.

image The red chip corresponds to the count of faults observed over the past X days. In this specific instance, we encountered 55 faults, which, within our context, is deemed acceptable.

image We dynamically adjust the color of the success rate chip based on its percentage. Currently, this is hard-coded to align with our internal quality assurance criteria.

image Moreover, by centralizing the instances within a stats object, we gain the ability to swiftly identify which instances have encountered faults. When users click on the chip displaying the fault count, a new dialog opens, providing access to the specific faults. These can be viewed either in a new tab or within the current one by clicking on the corresponding row.

These features prove invaluable for our team, especially considering the multitude of workflows that are continually executed. Ensuring everything functions as intended remains a top priority.

Should you wish to incorporate these enhancements into the base version of Elsa.Studio, we’d be delighted to share our code. Currently, these features are implemented in Elsa.Studio rather than Elsa.Core. For our specific use case, the client-side execution of the code has proven to be sufficiently fast.

glime-ai[bot] commented 4 months ago

It seems like you are proposing a new feature for Elsa Workflows that involves calculating and displaying statistics for workflow definitions to assess their health and performance. This feature would provide users with immediate visibility into the performance of their workflows, allowing for better decision-making and proactive maintenance.

Your proposal includes the creation of a service that analyzes workflow instances within a specified time frame for each defined workflow, aggregating data such as the number of finished, faulted, and executing instances. You also suggest formulas for calculating success and fault rates and propose a visual representation of these statistics in the Elsa Studio UI.

To proceed with this proposal, it would be beneficial to open a feature request on the Elsa Workflows GitHub repository, where you can outline your proposal in detail. This will allow the maintainers and the community to discuss the feasibility, potential impact, and implementation details of your proposed feature. If you already have code that implements this feature, you can mention this in the feature request and offer to share it.

Since you are offering to share your code, it would be helpful to create a pull request with your changes against the Elsa Workflows repository. This will allow the maintainers to review your code and decide whether to merge it into the base version of Elsa Studio.

Please note that contributing to an open-source project typically involves adhering to the project's contribution guidelines, which may include signing a Contributor License Agreement (CLA), following coding standards, and providing documentation for the new feature.

Would you like guidance on how to open a feature request or create a pull request on GitHub for Elsa Workflows?

This comment was generated by Glime.