Task_any / Task_all without trigger (or: How to livemonitor sub orchestration?)

kevinm90 commented 2 years ago

Hi everyone,

I am a bit confused with the task_any and task_all options.

Let me give you bit background: I am building a durable function to trigger and monitor a list of ADO pipelines.

all_pipeline_tasks = []
    id_ = 0
    for single_pipeline in all_pipelines_json:
        child_id = f"{context.instance_id}:{id_}"
        single_pipeline_task = context.call_sub_orchestrator("SinglePipelineRunAndMonitorOrchestrator", single_pipeline, child_id)
        all_pipeline_tasks.append(single_pipeline_task)
        id_ += 1

This works and is easy, when I just want to wait for all tasks (= pipeline runs) to complete. I just do

# All tasks should be completed, just double-checking again here
    yield context.task_all(all_pipeline_tasks)
    all_results = [single_run.result for single_run in all_pipeline_tasks]
    return f"Mass Execution completed. {all_results}"

But how do I collect intermediate status of the tasks? E.g. I want to output via the HTTP-API, that Task A (=Pipeline A): Completed Task B (=Pipeline B): Running Task C (=PIpeline C): Triggering failed ...

I tried with a loop over all tasks / remaining tasks via

remaining_tasks = all_pipeline_tasks
    while len(remaining_tasks) > 0:
        print(remaining_tasks)
        all_open_tasks = context.open_tasks
        finishedTask = yield context.task_any(remaining_tasks)
        print (f"TODO: Collect human readable status and use _set_custom_status to populate to HTTP Endpoint. Finished task: {finishedTask}")
        remaining_tasks.remove(finishedTask)

... but that does not work, as every yield task_any call will trigger all tasks again (and therefore I am ending up in an endless loop).

So to sum it up, is there any function available, that would just give me the next finished task (of a given list), without triggering these tasks? If not, could you look into creating this? Maybe as additional option to task_any / task_all. That would help a lot :-)

Thanks and regards Kevin

davidmrdavid commented 2 years ago

Hi @kevinm90,

Thanks you for reaching out!

There's certainly already already some ways of monitoring the status of sub-orchestrations. At the lowest level of the API (though not the most convenient), you can utilize the HTTP endpoints described here to get the status of any given orchestration ID. Using that, you should be able to determine the orchestrator state at any given time.

However, you're right that, at least on top of mind, I don't think this is easy to express in-code within an orchestrator. I'll need to think about that some more. I'm certainly open to proposing new APIs to make this easier.

To help me understand your proposal better, can you explain what you mean by "give me the next finished task of a given list without triggering these tasks"? A few examples would help me gain clarity here. Thank you :)

kevinm90 commented 2 years ago

Hi @davidmrdavid ,

thanks for your response.

Yes, you are right, I could monitor it via HTTP endpoints. My "problem" here is that I would like to avoid to iterate over all the orchestrations manually.

So to extend my example: So imagine I want to trigger and monitor 500 different Azure DevOps Pipelines (and get all of their results). I would write and call an orchestrator to trigger & monitor every single pipeline (which would contain a TriggerActivity and a Monitor Activity). So the mass orchestrator would receive the list of the 500 pipelines and then kickoff individual orchestrator for every single pipeline run. These 500 pipeline runs probably take some time (because of further limits set in ADO environment, may also be several hours). So if a human (or another tool) wants to retrieve the current results, I would need to call 500 times the http apis. But what someone is only interested in is: At time t: a) Which pipelines have completed b) What is their status

This I could easily reach, when the MassOrchestrator would simply wait for any pipeline run to complete and then adopt the custom status then. Let's say that would be orchestration Id 5. This orchestration id I want remove from the list of unfinished pipelines and then call a yield context.give_me_any_next_finished_task (list_of_unfinished_pipelines) So that is why I meant an option to use task_any without actually triggering the pipeilne would be pretty convenient.

Is it more clear now? If we should have a short call about this, I also dropped you an e-mail in case you want to reach out.

Thanks a lot for your support Regards Kevin

davidmrdavid commented 2 years ago

Thanks @kevinm90. I think this a lot more sense now. Let me add this to our discussion items and will look to update this thread with a follow-up. I can't say at the moment if we have the bandwidth to prioritize this, so let me get back to you after discussing with the team.

cc/ @lilyjma since this is a new feature request.

lilyjma commented 6 months ago

Hi @kevinm90 - thank you for using Durable Functions! I'm a PM working on DF and would love to learn about your experience using the product. You can share your feedback in this quick survey to help influence what the team works on next. If you're building intelligent apps, there's also an opportunity to participate in a compensated UX study. Thanks!

Azure / azure-functions-durable-python

Task_any / Task_all without trigger (or: How to livemonitor sub orchestration?) #385