ga4gh-discovery / ga4gh-case-discovery

A framework for searching genomic data sharing services
Apache License 2.0
8 stars 5 forks source link

Should we allow services to respond with data if they don't include components required by the request? #23

Closed fschiettecatte closed 6 years ago

fschiettecatte commented 6 years ago

harindra-a commented 6 years ago

Hey François, here are some thoughts, others can jump in,

-Do we want to be able to return arbitrary collection components, for example mme was requested but we only support exists?

I think for predictability, and simplicity, let's simply return "not supported" if a specific return type was asked-for and not supported

-Do we want to require support for a minimal list of components?

I would think so. Like MME? at least gene level? Let's discuss.

-What is the process for defining new component/record types?

yeah, we need to make one up. My feeling was,

i. make a ticket ii. voted on by implementers at the time ("steering committee") iii. timeline set

-Do we want to support defined sets of components/records for certain communities (MME)?

I don't think so. Ideally we don't separate by community. But let the hosts in the network simply return "not supported" as appropriate of simply ignore. Not sure which is better (noisy vs unpredictable)

-Do we allow private components?

My vote would be no, since it would add complexity?

We can go into all this in next meeting too

Relequestual commented 6 years ago

Do we want to be able to return arbitrary collection components, for example mme was requested but we only support exists?

I would say no. This will make it simpler, and make sure work is only done when the response will be useful to the requestor. We should use exsiting HTTP semantics, returning 417 in this situation.

Relequestual commented 6 years ago

Do we want to require support for a minimal list of components?

I think let's table this discussion till we have a few components nailed down.

Relequestual commented 6 years ago

Do we allow private components?

I'm not really sure what you mean by this. Can you expand with a use case? Probably woth creating as a new issue.

Relequestual commented 6 years ago

I'm indifferent regarding the other questions.

I would suggest moving forward that each question retains its own issue to avoid any cross discussion which can very quickly make for impossible reading of comments (see the case of 300 comment long issues... it happens, it's horrible).

fschiettecatte commented 6 years ago

I am pretty sure that all these questions already have their own issues, for example the one on private components are in issue #17

Relequestual commented 6 years ago

OK, great! Can you edit your first post to link to each issue, and then close this one please? Then I can move my comments to the associated issues.

Relequestual commented 6 years ago

Now the other questions have been migrated to individual issues, I've had some further thought on this specific question.

Relequestual commented 6 years ago

If a client makes a request which specifies it requires specific components in the response, it's likely it is only interested in the responses which include those components (say, must provide phenotypes).

It would be pointless for the server to do search processing to find results, if the client won't use them. It's better to save the time up front if the server knows it cannot fulfil the request in the way the client has asked, and so provides an appropriate "not supported" response. (This response may be a specific HTTP status code, and / or a JSON payload which represents the reasoning).

There may of course be cases where the server doesn't know up front that it won't be able to provide those components for the results of the search, for example, it may find a record with a given specific gene, stores phenotypes, but doesn't have any phenotypes for that specific record. This would result in the same response from the server (assuming it didn't find any other records for which it can represent a record in the requested components).

harindra-a commented 6 years ago

yeah good point(s). The patients stored in a database are not uniform collections as we know. Also, it might be useful from the querier side to get some information back, rather than none.

Wonder if this might be an implementation issue rather than a spec related, and better served in a "guidelines for implementation" doc. This resource saving thread here is good (with possible security implications), let's talk more about all this at a meeting (maybe with Dixie too at some point).

Relequestual commented 6 years ago

I agree, I think this is just a consideration for the implementation, but the question is, should a client be allowed to specify they require specific response content, and what should the server do if they cannot provide what's required.

If the requesting client wants to get some information back even if the server wants to return a record without a specific component, then the client doesn't specify that the component is required.

If client A simply must have phenotypes for it to be useful, then specifying so prevents a server from doing work which isn't going to be useful to the client.

If client B doesn't NEED phenotypes, then it simply doesn't say it requires them in the response.

Some clients may be only interested in aggregate data. If the server can't provide them with that information, there's no need to proceed.

harindra-a commented 6 years ago

those questions came up in a few meetings, and I believe we backed off them to some extent given the complexity they involved when taken as a whole and went with the simpler, "ask a question from a network of nodes; and each responds as possible or comfortable". Let's bring this up again next meeting, and decide to either add them (or a subset of them) in now, or post-October after some testing. I do believe these are important and will tighten the network, but are possibly non-trivial and need some discussion, so my feeling might be post-Oct.

Relequestual commented 6 years ago

That's strange, I would have thought known what types of data (and as a result which components) you have is not only trivial but essential.

Still, at the implementation level, no client has to specify any required components, which creates the situation as you say. Equally I don't want us to be in a situation where a new major version of the API has to be released for such to be possible.

I notice that I didn't actually explicitly explain the expected workflow in my proposal, so it may just be a case of this needs the clarity of explanation.

Relequestual commented 6 years ago

Consensus on call today: No. The reason for this and better explanation will be in the first release.