ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

feat: Add extra fields to RunStatus to provide additional information #172

Closed patmagee closed 1 year ago

patmagee commented 2 years ago

Inspired by this issue

The RunStatus object is quite minimal, and does not provide a lot of information to end users, requiring them to make additional requests to the /runs/{id} endpoint multiple times to find basic info about their runs. Additionally, when building tooling to support interaction with the WES API, it is very limiting to only be able to display two fields to a user.

This Pull request adds 4 fields to the RunStatus object which should all be readily available to the implementor. These fields were chosen because of their ability to help a user uniquely identify individual runs, and see how they progress. I attempted to strike a balance between response size and utility.

  1. start_time - Important for sorting, but also serves as a visual reminder to an end user to help provide context of
  2. end_time - Important for sorting. Helps quickly determine how long a run took which when viewed in relation to other runs is generally very helpful
  3. workflow_url - It is often very helpful to know what workflow is being run. it is challenging to correlate a run_id with a specific run without having the prompt of the workflow name
  4. tags - This is potentially something that we can exclude, however it is really really helpful to have additional context when listing runs. This parameter alone helps build powerful UI's and CLI's and will help with adoption
{
    "runs": [
        {
            "run_id": "01feb950-8e5e-405d-8804-43239171a52e",
            "state": "QUEUED",
            "workflow_url": "align.wdl",
            "tags": {
                "sample": "1"
            }
        },
        {
            "run_id": "47ec3102-5ca0-4aa8-bd24-a54cca629319",
            "state": "RUNNING",
            "workflow_url": "hello_world.wdl",
            "start_time": "2021-09-30T12:00:00Z",
            "tags": {
                "sample": "2"
            }
        },
        {
            "run_id": "c42c54fc-4de2-431d-a9b6-9584885f4f40",
            "state": "COMPLETED",
            "workflow_url": "hello_world.wdl",
            "start_time": "2021-09-29T12:00:00Z",
            "end_time": "2021-09-30T12:00:00Z",
            "tags": {
                "sample": "3"
            }
        }
    ]
}
uniqueg commented 2 years ago

In principle, I like this, but I think it would be a missed opportunity to harmonize WES and TES behavior. In the latter, there isn't an equivalent of /runs/{id}/status. Instead you can pass a view query param that can take on the values MINIMAL, BASIC and FULL, which return (more or less) the equivalents of, respectively, what the current /runs/{id}/status, the /runs/{id}/status described here and the /runs/{id} endpoints return.

Not saying that the TES solution is necessarily better, it was just always rubbing me a bit that these are implemented differently across WES and TES, and so before merging this we should perhaps have the discussion whether it wouldn't be better to make WES and TES behave similarly here, whatever the preferred solution may be.

patmagee commented 2 years ago

@uniqueg I like the idea around harmonization, esepcially as a way for making interop between the two API's easier. Another way to phrase this may be: What actually constitutes each of those levels. I suppose the current implementation could be considered MINIMAL, while the above may be BASIC. My one concern there is that if the MINIMAL view does not provide enough information to be useful to end users and API Consumers, is there a reason to keep it? If we went down that path, its possible that the default actually is BASIC?

uniqueg commented 2 years ago

@patmagee Yes, good point. I agree with you that the current implementation does not provide enough information for end users, but it may still be useful for client when they are just polling to get status updates on runs and tasks in WES and TES (especially since we don't have specified a callback mechanism yet, see #132 and there's also a corresponding issue in TES). In our use case, where we are polling TES every couple of seconds to see if the status changed, this happens a lot, and so a smaller response is preferable. In a more sophisticated client, we might query BASIC or FULL once at the beginning to populate a table of tasks/runs, then query just for the status, then query BASIC or FULL once again once a final state (error or success) has been reached.

patmagee commented 2 years ago

@uniqueg I can definitely appreciate the overhead that polling every few seconds would add on a system. However i think in this case the amount of data that we are returning, even with the additional fields is quite small. In the example above each object is approximately 125 bytes long, meaning 10,000 run status objects on a single page would only be 1.2mb. Since most systems return paginated responses, I doubt we ever would really see issues with performance from including multiple fields.

On the other hand, given how large a single workflow can actually be, I am not sure there would be a use case for the FULL query parameter on the List Run endpoint. You can have 1000's of tasks within a given workflow, so you would run into performance issues pretty quickly if you tried to use FULL when listing a large number of runs.

uniqueg commented 2 years ago

Fair points. I have no strong opinion on this except that I would really like to see the same solution for WES and TES. BTW, there is some discussion around filtering responses for listing tasks in TES, not sure if this discussion came up here as well. Generally I think that all of those discussions should be held between WES and TES champions and communities.

patmagee commented 2 years ago

@uniqueg not exactly the same use case, but I just created this pr, to handle large task lists for a specific workflow

patmagee commented 2 years ago

@uniqueg I think perfect consistency here between TES and WES will be a challenge. Implementations maybe have some but not all of the information available to them when listing all of the runs. In order to do the FULL view, you would need need to iterate over every single object in a list and get the expanded information for that object, which could be 1000's or 1,000,000's of tasks each.

IMO for WES, some sort of minimal representation that provides enough information to be useful, but not enough information to take the place of the GET /runs/{id} endpoint and be a hurtle to implementors would be ideal.

Maybe the current solution would be good without the tags since that is potentially an unbounded Map which could also be fairly substantial in size. That would mean we just have the right amount of information on start_time, end_time workflow_url, state and run_id. @wleepang wdyt?

uniqueg commented 2 years ago

@patmagee Fair enough. But I wasn't necessarily proposing that WES should adopt the solution currently implemented in TES (even though I tend to favor the way it's done in TES), but merely that TES people should join the discussion to come up with a common solution that works for both. Because I really don't see any fundamental differences in the problem space here: TES, just as WES, is potentially dealing with thousands or millions of jobs - in fact possibly more so than WES, given that - when TES is actually used - there is typically a one-to-many mapping of WES to TES jobs. Plus TES jobs that do not originate from workflow engines.

But to add, hopefully, something more constructive, here are a few points that I am considering:

Tagging the TES co-champions here: @kellrott @aniewielska

patmagee commented 1 year ago

@uniqueg these are all really great suggestions. I think looping in the TES folks makes a lot of sense here, even if we do not end up using the same approach, we should at least try to come to a common ground and have as much similarity between the two apis as possible.'

uniqueg commented 1 year ago

Agree with all, @patmagee - except perhaps the last one. Not good API design to use booleans unless you are really 100% sure there will never be a third option coming in - because then you'd invariably suffer a breaking change. So if we do introduce a query param, I would prefer it to be an enum, though I am certainly not particularly fond on the precise names (especially BASIC).

patmagee commented 1 year ago

@uniqueg maybe the simplest thing to do in this situation is choose the smallest subset of information and include that for now, and then Keep the option of the MINIMAL/ FULL view being added in the future? (this could be what is considered the BASIC view)

patmagee commented 1 year ago

@uniqueg @wleepang I am happy to get this looked at again. I have reduced the fields included to the bare minimum so as to be minimally impactful. That is not to say that implementors cannot choose to add whatever additional fields they seem fit, but for now I think less is more (for the sake of progressing the spec)

patmagee commented 1 year ago

@uniqueg Can you please look at this again. I have fixed the breaking change and fallout from the merging of the different definitions

uniqueg commented 1 year ago

Note also that I have created #206 so that we can quickly fix that regression before coming to a conclusion on this discussion (which may possibly still take a while).

patmagee commented 1 year ago

closed in favor of #207