dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

Starting abandoned child after parent closes #295

Open dhiaayachi opened 3 weeks ago

dhiaayachi commented 3 weeks ago

Attempting to start a child workflow right before parent closes (for example on history event count limit) creates the child workflow execution but this execution never receives the first workflow task (stays open).

This child workflow stays running and claims its child workflow id (cannot be used by other child workflows) as well as needs to be manually terminated.

dhiaayachi commented 1 week ago

Thank you for reporting this issue.

This appears to be related to the race condition between when the parent workflow is closed and when the child workflow is created. As documented, you should ensure that the ChildWorkflowExecutionStarted event is logged in the parent workflow's history before the parent workflow completes. This can be achieved by either:

  1. Waiting for the child workflow to start using the GetChildWorkflowExecution() method: This ensures the child workflow has been successfully spawned before the parent completes.
  2. Using the ABANDON parent close policy: This allows the child workflow to continue running even if the parent is closed. However, you should ensure that the child workflow is designed to handle the possibility of the parent workflow being closed.

Here are some additional points to consider:

If you continue to experience this issue, please provide the following information:

This will help us better understand your specific issue and assist you in finding a solution.

dhiaayachi commented 1 week ago

Thank you for reporting this issue. It sounds like you are experiencing a known issue where a child workflow can be created but never receives its first workflow task when the parent workflow closes before the child is properly started. This issue can be caused by a race condition where the parent closes before the child is fully registered.

To work around this issue, you can follow these steps:

  1. Use StartChildWorkflowAsync() instead of ExecuteChildWorkflowAsync():
    • The StartChildWorkflowAsync() method starts the child workflow but doesn't wait for it to complete. This allows the parent workflow to close before the child workflow completes, avoiding the race condition.
  2. Wait for the child workflow to start:
    • After starting the child workflow with StartChildWorkflowAsync(), use the returned handle to wait for the ChildWorkflowExecutionStarted event to occur. This confirms that the child workflow has been properly registered and will receive workflow tasks.
  3. Ensure the child workflow is properly configured:
    • Ensure the child workflow is properly configured with ParentClosePolicy set to Abandon. This will prevent the child workflow from being terminated when the parent workflow closes.

Here is an example of how to implement these changes:

// Start the child workflow asynchronously
var childHandle = await Workflow.StartChildWorkflowAsync(
  (MyChildWorkflow wf) => wf.RunAsync(),
  new() { ParentClosePolicy = ParentClosePolicy.Abandon });

// Wait for the child workflow to start
await childHandle.GetWorkflowExecutionAsync(); 

These steps ensure the child workflow starts correctly even when the parent workflow closes. If you are still experiencing the issue, please provide more information about your workflow implementation, including the code snippets for your parent and child workflows, so that we can further assist you.

dhiaayachi commented 1 week ago

Thank you for reporting this issue.

This is a known issue related to the interaction between child workflows and the parent workflow's termination, specifically when it happens due to reaching a history event limit.

The issue is related to the timing of the child workflow's startup and the parent workflow's termination. When the parent workflow terminates due to the history event limit, it might not have enough time to properly signal the child workflow to start. This causes the child workflow to hang in a "starting" state.

We recommend using the start_child_workflow() function instead of execute_child_workflow() to explicitly wait for the child workflow to start before terminating the parent. By using start_child_workflow() and waiting for the ChildWorkflowExecutionStarted event in the parent workflow's event history, you can ensure that the child workflow has started before the parent workflow terminates.

Here's an example of how to use start_child_workflow() and wait for the child workflow to start:

from temporalio.workflow import ParentClosePolicy, start_child_workflow

@workflow.defn
class ParentWorkflow:
    @workflow.run
    async def run(self):
        child_handle = await start_child_workflow(
            ChildWorkflow.run,
            parent_close_policy=ParentClosePolicy.ABANDON,
        )
        # Wait for the child workflow to start
        await workflow.wait_for_child_workflow_started(child_handle)
        # ...

This approach ensures the child workflow is properly started before the parent terminates. Let us know if you have any other questions.

dhiaayachi commented 1 week ago

Thank you for reporting this issue. It looks like the child workflow is not receiving its first workflow task when the parent workflow is closing due to reaching the history event count limit. This is a known issue and we are actively working on a fix. In the meantime, you can try the following workaround:

Please let me know if you have any further questions.