ga4gh / task-execution-schemas

Apache License 2.0
80 stars 28 forks source link

ListTasks filter: query language rough draft #92

Closed buchanae closed 6 years ago

buchanae commented 6 years ago

Here's a rough draft of a ListTasks filter query language. This is not complete. I'm not even sure this is a good idea, mainly I'm providing a starting point for discussion.

Pros:

Cons:

buchanae commented 6 years ago

This comment has a decent collection of query languages seen around the web: https://github.com/elastic/kibana/issues/12282#issuecomment-308831510

buchanae commented 6 years ago

An alternative would be to drop the query language and provide individual fields in the ListTasks message:

State state_filter = 6;

map<string,string> tag_filters = 7;

string created_after = 8;
string created_before = 9;

Or something like that. I'm not sure how this translates to a GET request's URL parameters yet.

adamstruck commented 6 years ago

Its not clear to me how either of these would be encoded in the URL for HTTP requests. Please provide some clarification in the spec.

adamstruck commented 6 years ago

We should document the order that the tasks are expected to be returned from this endpoint.

I suggest that ListTasks always returns the tasks in the order that they were created (descending order). This would depend on #90.

buchanae commented 6 years ago

I started working on a more complete definition of the schema started in https://github.com/ga4gh/task-execution-schemas/pull/92#issuecomment-338825476

  // Filters
  // =======
  // Filters are combined together using logical AND.
  // Logical OR and NOT are not supported.

  // OPTIONAL
  // 
  // Filter tasks based on the Task.state field.
  State state_filter = 6;

  // OPTIONAL
  // 
  // Filter tasks based on the Task.tags field.
  // Multiple entries are combined using logical AND.
  // A tag filter matches a Task tag if both the key and value are exact matches.
  // A "*" character allows matching any string. This does NOT allow partial string matching.
  //
  //   Filter                            Tags                          Match?
  //   ----------------------------------------------------------------------
  //   {"foo": "bar"}                    {"foo": "bar"}                Yes

  //   {"foo": ""}                       {"foo": ""}                   Yes
  //   {"foo": ""}                       {"foo": "bar"}                No

  //   {"foo": "bar", "baz": "bat"}      {"foo": "bar", "baz": "bat"}  Yes
  //   {"foo": "bar"}                    {"foo": "bar", "baz": "bat"}  Yes

  //   {"foo": "*"}                      {"foo": "bar"}                Yes
  //   {"*": "bar"}                      {"foo": "bar"}                Yes

  //   {"*": "*"}                        {"foo": "bar"}                Match everything?
  //   {"*": ""}                         {"foo": "bar"}                ?
  //   {"": "*"}                         {"foo": "bar"}                Invalid?
  //   {"": ""}                          {"foo": "bar"}                Invalid?
  //   {"": "bar"}                       {"foo": "bar"}                Invalid?

  //   {"foo": "*ar"}                    {"foo": "bar"}                No (not a partial match)
  //   {"fo*": "bar"}                    {"foo": "bar"}                No (not a partial match)
  //   {"foo": "ba"}                     {"foo": "bar"}                No
  //   {"foo": "bar", "baz": "bat"}      {"foo": "bar"}                No
  //   {"foo": "bar", "baz": ""}         {"foo": "bar"}                No
  //   {"foo": "bar", "baz": "*"}        {"foo": "bar"}                No
  map<string,string> tag_filter = 7;

  // OPTIONAL
  // 
  // Filter tasks based on the Task.creation_time field.
  // Date + time is formatted as RFC 3339.
  string created_after = 8;

  // OPTIONAL
  // 
  // 
  // Date + time is formatted as RFC 3339.
  string created_before = 9;

But, I've hit a roadblock. map<string,string> tag_filter does not easily map to URL parameters. The options I can think of:

  1. Stick with a full query string, as originally proposed. Write a reference parser to clearly define the details.
  2. Go with the protobuf filter fields, but use a repeated string tag_filter instead of a map<string,string>. We'd still need most of the parsing rules from option 1 though.
  3. Define an encoding of map<string,string> to URL params. For example, go-querystring defined this as Tags=map%5Bone%3Atwo+three%3Afour%5D which decodes to Tags=map[one:two three:four].
  4. Change ListTasks to be a POST request, use the request body for filter fields, don't support URL parameters. This is contrary to the Google Design guide we've been trying to follow.
  5. Change Task.tags to be a repeated string. The filter is then a repeated string as well, which easily maps to URL parameters.

Honestly, I'm not in love with any of those options.

buchanae commented 6 years ago

We've also discussed recently that having control over sort order would be very useful, e.g. order tasks by creation time ascending or descending.

adamstruck commented 6 years ago

UPDATE: map<string,string> tag_filter now maps to URL parameters in grpc-gateway. They take the form ?tags[key]=value. I propose we adopt this standard.

buchanae commented 6 years ago

We're experimenting with filtering in Funnel, because we needed to start writing code to see our way through this issue. This is the ListTasksRequest message we have so far: https://github.com/ohsu-comp-bio/funnel/blob/master/proto/tes/tes.proto#L422

That's working well I'd say. I think we can easily add a filter for creation_time and wrap this up if people agree with the approach.

As @adamstruck mentioned, a tag filter is serialized into URL params such as ?tags[tagkey]=tagval. To filter for tasks with tags project=example and user=buchanae, the URL would be ?tags[user]=buchanae&tags[project]=example (hope I got that right).

For simplicity, we require an exact match of key and value, i.e. you can't match on only the tag while ignoring the value, and vice versa.

buchanae commented 6 years ago

closed in favor of the cleaned up version in #104