dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

Metric specific to workflow retries per namespace #156

Open dhiaayachi opened 2 months ago

dhiaayachi commented 2 months ago

Currently we seem to only have workflow_retry_backoff_timer available to check for count of workflow retries.

Feature request is to add workflow_retried counter metric that would be able to be filtered by namespace.

Thanks.

dhiaayachi commented 2 months ago

This is a great feature request! I understand the need for a workflow_retried counter metric that can be filtered by namespace. Currently, the only available metric for checking the count of workflow retries is workflow_retry_backoff_timer.

Here's why this feature is valuable and some potential ways to implement it:

Benefits of a workflow_retried Counter:

Potential Implementation:

How to Contribute:

You can contribute to this feature request by:

I hope this helps! Let me know if you have any other questions.

dhiaayachi commented 2 months ago

Thank you for the feature request!

We understand the need to have a workflow_retried counter metric that can be filtered by namespace. Currently, the available metric workflow_retry_backoff_timer doesn't provide this level of granularity.

We are actively exploring ways to improve our metrics and instrumentation capabilities. In the meantime, you can use the workflow_retry_backoff_timer metric and filter by namespace using the namespace tag, but it will only provide information on the backoff duration and not the retry count.

We appreciate your feedback and will consider your suggestion for future development.

dhiaayachi commented 2 months ago

Thank you for the feature request! We appreciate your feedback and are always looking for ways to improve Temporal.

Currently, there is no built-in counter metric for workflow retries that can be filtered by namespace.

You can use the workflow_retry_backoff_timer metric, which can be filtered by namespace, to understand the retries within a specific namespace. While this metric doesn't directly represent the count of retries, it can provide insight into the retry frequency within a particular namespace.

We'll consider adding the workflow_retried counter metric in the future.

dhiaayachi commented 2 months ago

Thanks for the feature request. This would be a very useful metric to have.

Unfortunately, there is no built-in metric that tracks the number of retries per workflow. However, a workaround would be to use a custom metric within your workflow code and update it whenever your workflow is retried. You can then export the metric to your metrics backend.

For more information on workflow retries, refer to this documentation: Failure detection - Temporal feature

dhiaayachi commented 2 months ago

Thank you for the feature request! We appreciate your suggestion to add a workflow_retried counter metric that can be filtered by namespace.

Currently, there isn't a dedicated counter metric for workflow retries. You can use the existing workflow_retry_backoff_timer metric as a workaround. This timer tracks the duration of the backoff period between workflow retries. By analyzing this timer, you can infer the number of retries for a particular workflow.

We'll consider adding the workflow_retried counter metric in future releases.

dhiaayachi commented 2 months ago

Thanks for the feature request. Currently, Temporal doesn't offer a metric specific to workflow retries per namespace. You can use the workflow_retry_backoff_timer metric to track retry attempts, but you can't filter it by namespace. As a workaround, you can use the temporalctl CLI to query the metrics and then filter the results based on the namespace. For more information, please refer to the Temporal CLI documentation: https://docs.temporal.io/cli/cmd-options.