elixir-cloud-aai / proTES

Proxy service for injecting middleware into GA4GH TES requests
Apache License 2.0
5 stars 6 forks source link

Tasks not filtered by name prefix #127

Closed uniqueg closed 1 year ago

uniqueg commented 1 year ago

A GET request to /tasks?name_prefix=foo&view=MINIMAL should return only tasks whose identifiers start with foo. However, it appears that all tasks are returned, not filtered at all.

To reproduce

Expected behavior

Should only list tasks whose IDs start with foo (probably none at all).

Actual behavior

Lists all tasks, including those whose identifiers do not start with foo.

SohamRatnaparkhi commented 1 year ago

Hi @uniqueg , I would love to solve this issue. One possible brute force approach to solving it would involve iterating through all the tasks and appending any tasks that begin with a specified string, such as 'foo,' to a list, which would then be returned.

I have reviewed the code base, but I was unable to locate the controller for this request. Based on my analysis, the controller may be in either server.py or task_runs.py. If I am mistaken, please direct me to the correct controller. Otherwise, I am prepared to create a new function to address the issue.

Thank you!

uniqueg commented 1 year ago

Thanks @SohamRatnaparkhi, I have assigned you.

As stated in the issue, the endpoint is GET /tasks. The controller name for that you can find in the OpenAPI specification. I am not looking it up for you, so that you will have a chance to explore how OpenAPI specs are structured ;-)

Note that there are two general approaches you could follow here:

  1. Only fetch tasks matching the prefix from the database. Here you would need to find out how you can define and pass such a filter when doing a database call. Such a solution is preferred as it will be much more performant, especially when we will have thousands of tasks. However, I am not 100% sure it's possible with MongoDB, but https://www.mongodb.com/docs/manual/reference/operator/query/regex/ looks promising.
  2. Filter matching tasks after the database call. This is the general approach you describe. If you do not get anywhere with approach (1), make sure that you don't just use a plain for loop. Instead make use of map() or use a list comprehension. In fact, you should probably try both and see which approach is more performant. If you can think of any other solutions, include them in your benchmark.
SohamRatnaparkhi commented 1 year ago

Thanks @SohamRatnaparkhi, I have assigned you.

As stated in the issue, the endpoint is GET /tasks. The controller name for that you can find in the OpenAPI specification. I am not looking it up for you, so that you will have a chance to explore how OpenAPI specs are structured ;-)

Note that there are two general approaches you could follow here:

1. **Only fetch tasks matching the prefix from the database.** Here you would need to find out how you can define and pass such a filter when doing a database call. Such a solution is preferred as it will be much more performant, especially when we will have thousands of tasks. However, I am not 100% sure it's possible with MongoDB, but https://www.mongodb.com/docs/manual/reference/operator/query/regex/ looks promising.

2. **Filter matching tasks _after_ the database call.** This is the general approach you describe. If you do not get anywhere with approach (1), make sure that you don't just use a plain `for` loop. Instead make use of `map()` or use a list comprehension. In fact, you should probably try both and see which approach is more performant. If you can think of any other solutions, include them in your benchmark.

Yep. This works as well! I will try to come up with a solution soon and will draft a PR.