dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

Failed workflow list for batch operations #234

Open dhiaayachi opened 3 months ago

dhiaayachi commented 3 months ago

Is your feature request related to a problem? Please describe. When running batch operations, especially batch reset. Operator will like to know the status of the job and if the operation fails for any workflow. And if the operation does fail, a list of those failed workflows.

Describe the solution you'd like A way for operator to get the list of workflows on with the batch operation has failed.

Today we log the failed workflowID, maybe that's enough. But if not, we need to another way:

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

dhiaayachi commented 2 months ago

Thank you for your feature request!

While the Temporal CLI currently does not provide a way to get a list of failed workflows after a batch operation, you can achieve similar functionality by logging the failed workflow IDs. This approach allows you to track and troubleshoot issues.

Here's an alternative solution:

  1. Logging: As you mentioned, logging the failed workflow IDs is a good starting point. You can analyze these logs to understand the reasons for failure.

  2. Workflow Visibility: You can use the Temporal Visibility feature Temporal Documentation - Visibility to query and filter workflows based on specific criteria. For example, you can tag workflows with information about the batch operation they were part of, such as "batch-reset-2024-03-01". Then, you can use this tag to find all workflows related to a particular batch operation.

  3. Workflow Query: You can use the temporal workflow query command Temporal CLI to query running workflows and get specific information about them. This can be helpful for identifying workflows that are failing within a batch operation.

We appreciate your feedback and will consider incorporating a dedicated mechanism for tracking and retrieving failed workflows from batch operations in future releases.

dhiaayachi commented 2 months ago

Thank you for your feature request! This is a great suggestion to help operators manage batch operations more effectively.

Currently, the Temporal CLI provides a way to track the status of batch operations, including whether they succeeded or failed. The CLI logs the workflow IDs that failed during the operation, which can be helpful for troubleshooting.

However, we understand that you are looking for a more comprehensive view of failed workflows. We are considering adding support for:

In the meantime, you can achieve a similar result by using the Temporal CLI's filtering capabilities with the --query flag to identify workflows that failed during a batch operation. For example:

temporal workflow list --query 'WorkflowType="YourWorkflow" AND ExecutionStatus="Failed"' 

This will list all workflows of type YourWorkflow that have failed. You can also use other filtering criteria as needed.

We appreciate your feedback and will consider these suggestions as we continue to improve the Temporal CLI.

dhiaayachi commented 2 months ago

Thank you for your feature request!

This is a great suggestion, and we appreciate you bringing it to our attention.

Currently, the Temporal CLI doesn't provide a direct way to list failed workflows within a batch operation. However, you can achieve a similar outcome by combining the temporal batch describe and temporal workflow list commands.

Here's how you can work around this:

  1. Run a batch operation: Execute the temporal batch cancel, temporal batch signal, or temporal batch terminate command with the desired query.
  2. Get the batch job ID: After successfully starting the batch job, you will receive a Job ID.
  3. Describe the batch job: Use the temporal batch describe --job-id=MyJobId command to get information about the batch job, including its status and any errors encountered.
  4. List failed workflows: If the batch job has failed, you can use the temporal workflow list command with the appropriate filters to list the workflows that failed. For example, you could filter by workflow type and execution status.

While this solution involves multiple steps, it allows you to identify and track workflows that failed during a batch operation.

We hope this helps!

dhiaayachi commented 2 months ago

Thank you for reporting this issue.

The temporal batch command currently does not provide a way to get a list of failed workflows. You can use the temporal batch describe command to get the status of a batch job, which will include the number of workflows that were affected by the operation.

If you need more information, please let us know.