Garden-AI / garden

https://garden-ai.readthedocs.io
MIT License
19 stars 4 forks source link

Add async task execution support to Garden client #483

Closed WillEngler closed 1 month ago

WillEngler commented 3 months ago

Currently, the __call__ method on Entrypoints submits a Globus Compute task and blocks until it is complete. This sort of works ok with our current K8s-only compute resources. It won't work for HPC job submission where tasks will wait on a job queue for an unknown amount of time.

Going along with the execution model of Globus Compute, there should be a way for users to submit a function call, get a task ID, and come back later to retrieve the result.

I propose to add a submit method to the Entrypoint class. Instead of blocking on the response being ready, this will immediately return a Globus Compute task ID. Then, there should be a check_result method on the GardenClient class that takes a Globus task ID and returns either the result or information about the pending/error state about the task.

There are likely ways we can be more sophisticated and, say, support submitting and retrieving multiple tasks at once. If someone wants to propose a richer interface, feel free. But this ticket will be satisfied by the simple "MVP" version of this functionality - we just need to submit and retrieve one task at a time right now.

Assumptions:

1. 2.

Acceptance Criteria

Given I have the Garden SDK installed and I want to execute a Garden entrypoint, when I user some_entrypoint.submit(), then I receive a Globus Compute task ID, from which I can retrieve the task result with garden_client.check_result()

WillEngler commented 1 month ago

Closing as we hit the end of summer intern season and are re-orienting on HPC dev