ga4gh / task-execution-schemas

Apache License 2.0
80 stars 27 forks source link

Adding filtering features to task listing #170

Closed kellrott closed 1 year ago

kellrott commented 2 years ago

Responding to #57 and porting changes from #104

uniqueg commented 2 years ago

From @kellrott

One of the suggested features around TES 1.1 is the ability to filter job listings by tag key/value pairs. Each TES Task has a dictionary where arbitrary key/values can be stored for annotating jobs ( https://github.com/ga4gh/task-execution-schemas/blob/develop/openapi/task_execution_service.openapi.yaml#L666 ), it isn't used by the server, or expected to be seen by the backend. Rather it is simply a space for users to store annotations, like workflow, step or tool ids, that may help them when that are scanning the job log for downstream logic.

The ability to filter job listing, using these tags is proposed in PR 170: https://github.com/ga4gh/task-execution-schemas/pull/170 and specific format for these requests is done in https://github.com/ga4gh/task-execution-schemas/blob/issue/104/openapi/task_execution_service.openapi.yaml#L111

In this version, the schema uses the additionalProperties option to capture all additional non-schema flags from the URL query and stores them in an object. This doesn't seem like the best method. First, it means that there will be a subset of words that can't be used for tags, namely 'name_prefix', 'state', 'page_size' and 'page_token'. Not a big loss, but still a little less than neat. But more importantly, the 'additionalProperties' feature is one of those things that falls in the 'new and not full supported' category for a lot of code generation tools. I noticed this when trying to setup for OpenAPI-Generator based go-server code for Funnel. I worry that including this feature in the specification would break possible implementations, until the code generation gets a lot better.

Does anyone have any thoughts about a better way to support key/value filtering of tags for job listings? Is there a pattern that has been used in any of the other GA4GH APIs?

uniqueg commented 2 years ago

@kellrott: Do you have an example call with filters that highlights the namespace issue (i.e., not being able to use, e.g., state and page_size)? Just by looking at the proposed schema I would have guessed that the additional properties are nested under property tags - in which case there wouldn't be any namespace conflict.

As for wanting to avoid additionalProperties: I think that the other Cloud APIs either do not include arbitraty key-value pairs or don't support filtering on them. TRS does filtering based on pre-defined query params - it's not particularly nide either, IMO. I don't really see a way around using additionalProperties here (at least if we want to keep this a GET endpiont, which I think is a given). So even though I share your concern, in my opinion the outlined solution is the way to go.

uniqueg commented 1 year ago

From @kellrott and @aniewielska: It was suggested during recent Cloud WS meeting to check out how this is done in k8s.

Reference: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#list-and-watch-filtering (likely too complex, but possibly parts can be used)

lvarin commented 1 year ago

Another options would be to use annotations (https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/). I have seen several tools that integrate with others using annotations. For example the Let's Encrypt controller (https://github.com/tnozicka/openshift-acme) reads and writes annotations from the object it manage. I think the main advantage is that labels are often used ti link objects (Pods and services for example), and annotations are pure metadata.

(Just my two cents)