ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Add unique identifier and additional information to tasks #204

Closed patmagee closed 1 year ago

patmagee commented 1 year ago

Why

When polling workflow state it is often helpful to monitor the progress of a given task over time. Unfortunately there currently is no guarantee that any identifier present in the task_logs or list returned by the /tasks endpoint and you can never be sure that the task you are looking at is the same one from the preview retrieval

Additionally, Since merging: https://github.com/ga4gh/workflow-execution-service-schemas/pull/177 into the spec the goal is to make individual tasks addressable in the future.

Proposal

  1. We can add a new task_id attribute within the Log object. I prefer this approach since it can be an easy transition to presenting TES information. If we do not want to return TES objects directly, we could also include a link tes_url to the task definition if we ever wanted to go that way.. but that is all for future consideration, adding an ID is minor bridge there
  2. We can add verbage stating the name of the task should be Unique across all runs. IMO this is the inferior option, since the name often has some sort of meaning to the end user. Requiring this attribute to be unique now may require the engine to obscure the original name
patmagee commented 1 year ago

@uniqueg what are your thoughts on this?

uniqueg commented 1 year ago

I fully agree that there should be a way to address individual tasks by identifier, and I am definitely in favor of any solution that increases consistency/compatibility with TES. So yes, I would model this according to how this info is represented in TES and would therefore also prefer option 1. However, perhaps better to use id instead of task_id, as this is how a task ID is referred to in TES.

Of course, in TES there is more nesting of logs vs the flat architecture of WES Logs (which I guess was the consensus for the current solution), so adding a tes_url option as an alternative to providing the full Logs would make things complicated, because every WES client would then need a TES client as well to make sense of TES responses and convert them to the flat representation in order to provide the same information that is available for tasks that do not make use of tes_url. But we could of course add it as an optional field that can be provided for additional info for those clients that know how to make use of it.

But on top of adding id (and tes_url), shouldn't we also add a /runs/{run_id}/tasks/{task_id} endpoint to make it easy to access individual task logs?

What do you think, @kellrott and @vsmalladi?

vsmalladi commented 1 year ago

I agree. I like option 1. I would want to think through the /runs/{run_id}/tasks/{task_id} endpoint addition.

So would a user that has WES and TES endpoints not go to the TES endpoint to understand individual task logs and could find everything at WES?

patmagee commented 1 year ago

@vsmalladi i think Wes should not act as a barrier to getting to the raw tes logs if they are available.

I also think in general tes is a lower level abstraction that an end user will likely need to be exposed to. It enables engines to interface with infrastructure. So I think there will always be a need for an independent definition of a WES task which is a lighter weight version of a TES task.

If we end up defining the task body in a way that is not compatible with a TES log or pointing to a tes log in some way, I would be dissappointed

Another question would be if adding a required field is a breaking change or not...

uniqueg commented 1 year ago

To the last point - it would be, but only relative to the addition of #177. So if #177 is not yet part of a release, I think we should be good.

Good example, by the way, why it's useful to have a certain embargo time before new, non-trivial (and therefore potentially unstable) changes are released.

patmagee commented 1 year ago

@uniqueg good point, it has been merged and is slotted for release, but is not yet part of one.

We had originally though it may be better to hold off on adding the identifier now, but I think it's looking like it may be easier to do it now

uniqueg commented 1 year ago

For sure it's easier to make it required now and lift that requirement later, if it turns out later that there are good reasons against it.

vsmalladi commented 1 year ago

This is a great idea @patmagee thanks for the clarification.

patmagee commented 1 year ago

I think what may be a good approach is not changing any of the schema of the Log object which is currently used in the task_logs array. What if we introduced a new type for the tasks called TaskLog which adds a small amount more additional functionality borrowed from TES (without trying to reimplement TES and keeping the door open for using it).

We could then make the return type of GET /ga4gh/wes/v1/runs/{run_id}/tasks -> TaskListResponse with an array of the the modified TaskLog. If you retrieve the tasks from the GET /ga4gh/wes/v1/runs/{run_id} endpoint, the task list would stay as the Log type, although we are not changing how any of the existing keys work so TaskLog is a superset of Log

{
  "tasks": [
    {
      "id": "5216DEE8-4A79-438F-BD91-93FFAD052B7A",
      "state": "RUNNING",
      "name": "hello_world",
      "cmd": [
        "echo hello"
      ],
      "start_time": "2023-03-28T12:11:05.123Z",
      "end_time": "2023-03-28T12:11:05.123Z",
      "stderr": "/executions/5216DEE8-4A79-438F-BD91-93FFAD052B7A/logs/stderr.txt",
      "stdout": "/executions/5216DEE8-4A79-438F-BD91-93FFAD052B7A/logs/stderr.txt",
      "system_logs": [
        "/executions/5216DEE8-4A79-438F-BD91-93FFAD052B7A/logs/backend.txt"
      ],
      "exit_code": 0
    }
  ]
}

You may noticed there are a few new fields I added here based on general feedback and existing issues.

  1. id... the whole purpose of this ticket
  2. state: this is a really really nice to have and without it there is a big gap in understanding of whether a task failed or succeeded according to the semantics of the language (ie WDL can set a valid non 0 return code)
  3. system_logs: is a request and may satisfy this ticket
uniqueg commented 1 year ago

I think this looks good, but I would include tes_uri (and maybe build the TES URI alongside the lines of DRS and TES URIs.

That is, just for the record, in spite of my general preference for a TES-based solution, because I think that everything that is relevant for TES is also of relevance for workflow tasks, even if engines don't make use of TES :) ...and acknowledging that there is room for improvement on the TES side in terms of schema simplification and better support for provenance (just like in WES).

patmagee commented 1 year ago

@uniqueg if we added tes_uri, as an optional field if the engine can return that level of information would that work?

It has the benefit of not pegging wes to a specific version of tes as well

uniqueg commented 1 year ago

Yeah, a coordination of WES and TES would require a lot more planning (and probably some externalization of schemas that both WES and TES depend on). Maybe something for an eventual, coordinated WES/TES 2.0.0 release way down the road.

Until then, I'll be more than happy with an optional tes_uri field expecting something like tes://my.tes.org/MyTa3k, with the TaskLog schema that extends the existing Log.