ga4gh / mme-apis

Documentation for the MatchmakerExchange APIs
https://github.com/ga4gh/mme-apis
34 stars 19 forks source link

Result paging/max number of requests #36

Open allasm opened 9 years ago

allasm commented 9 years ago

Add support for either paging or a "maxNumReplies" field

Relequestual commented 9 years ago

+1

fschiettecatte commented 9 years ago

+1

Relequestual commented 8 years ago

With reference to https://github.com/ga4gh/mme-apis/issues/86#issuecomment-77477745, we should also add totalResults. We will need to work out what level these elements should be at.

Relequestual commented 8 years ago

This would be a breaking change. Removing from 1.1 milestone.

DECIPHER currently limits the number of results, following how Phenome Central works. Given that, we could close this issue in favour of suggesting that systems only return the top 20 matches. Thoughts?

fschiettecatte commented 8 years ago

I am fine with the top 20, seems like a defensible number to me.

buske commented 8 years ago

I still think we should allow specification, with size/offset or page/pageSize parameters, but I'm fine with removing this from v1.1.

harindra-a commented 8 years ago

Hi guys, Sorry, still catching up to these older issues on github,

If I am reading correctly: the ticket was for setting a hard limit on the number of results returned? I assume this top 20 is defined via the home centers match and scoring criteria?

I would vote against such a limitation being put in since it would have the potential to not share certain good matches. I think what Orion is suggesting was doing a similar (but less restrictive) thing via paging? I think that is a good idea.

On chatting with our analysts, their view was that they would like to see all hits (irrespective of score) that share some gene (a gene that a shared variant happens) on.

Or I am reading the ticket/idea wrong?

Best, Harindra

On Thu, Aug 4, 2016 at 3:03 PM, Orion Buske notifications@github.com wrote:

I still think we should allow specification, with size/offset or page/pageSize parameters, but I'm fine with removing this from v1.1.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-237651328, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_E9bCe5D2zcMSF5zvRu3BYI5ZRNWks5qcjeCgaJpZM4DU0qc .

buske commented 8 years ago

Hey @harindra-a, you're right. The ticket was created to support specifying paging or response size control in the query. @Relequestual was proposing instead to set a hard limit.

fschiettecatte commented 8 years ago

We should discuss this in Toronto, my vote is to keep the 1.0-1.5 as simple as possible, and pushing the start/limit to 2.0 because that is a more natural fit with what Orion is proposing for 2.0, namely a 'search' paradigm rather than a 'matching' paradigm.

harindra-a commented 8 years ago

OK great, thanks Orion, I definitely vote paging, and we would need it either way at some point I think.

On Thu, Aug 4, 2016 at 3:55 PM, François Schiettecatte < notifications@github.com> wrote:

We should discuss this in Toronto, my vote is to keep the 1.0-1.5 as simple as possible, and pushing the start/limit to 2.0 because that is a more natural fit with what Orion is proposing for 2.0, namely a 'search' paradigm rather than a 'matching' paradigm.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-237664820, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_NaUzqRJFRBuBbS4uZH7LdpdmWGSks5qckOUgaJpZM4DU0qc .

Relequestual commented 8 years ago

A lot of feedback we have gotten from people using the MME, including internally within MME from users in the policy group, is too many results are bad and unuseful. They want only a few results, they don't, ideally, want more than a handfull.

Showing ALL results can sometimes, in our case, result in thousands of patient records. If you're getting thousands of genuin matches, then your probably would have found the same result in population varaiation databases already. Otherwise, we need to allow the user to specify further the matching cirteria (coming in 2.0), reducing the number of results.

When searching on the DECIPHER website, if any query has too many results, we simply tell the user their search is too broad, and they need to be more specific. I think that encouridging this in the MME network also has value, hopefully encouridging the users to included more detailed genotypic and phenotypic data, making the record more useful.

buske commented 8 years ago

I completely agree. I think it's useful to separate:

  1. How many matches a site should show a user on the site (I agree, a pretty small number)
  2. How many matches a site should actually notify a user of (e.g., via email) (I think a VERY small number)
  3. How many matches the API should return (I don't see any reason to limit this, as long as the server can threshold/filter however it likes)
harindra-a commented 8 years ago

I agree with Orion's points. 1,2 deals with user-experience, and 3 is a backend/server concern.

That separation allows the host to filter/control the user experience (1,2) via some front-end/process and do what Ben wanted to do(?) and still not limit the API with an arbitrary cutoff (which we also think is not a good idea).

On Mon, Aug 15, 2016 at 3:58 AM, Orion Buske notifications@github.com wrote:

I completely agree. I think it's useful to separate:

  1. How many matches a site should show a user on the site (I agree, a pretty small number)
  2. How many matches a site should actually notify a user of (e.g., via email) (I think a VERY small number)
  3. How many matches the API should return (I don't see any reason to limit this, as long as the server can threshold/filter however it likes)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-239748099, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_F_RYtgWhnrl6tisy36uC47Jjbwiks5qgBwQgaJpZM4DU0qc .

Relequestual commented 8 years ago

We arrived at the need to set an arbitrary cut off because there were some cases where hundreds or even over a thousand results were returned. No system was setup to handle processing and displaying that number of results.

Say the data was paged at 50, and you had 500 results in a random order. You would have to retrieve and store all that data, process it to work out which was the best results to display, filter out othe results, then display to the user. That situation creates a huge overhead and adds complexity for each system wanting to implement MME.

Also there's the risk (from the responders perspective) of divulging too much data in one go. We do not want to expose such a large amount of data. Making the request more specific will give the system (and the user) access to what they want without such a risk.

I can't support adding pagination of results, but I think returning a number of additional results makes sense, allowing the user / system to make a more specific request. That is something that should come with a newer "2.0" (but not semver 2.0) version of the API.

fschiettecatte commented 8 years ago

I'll wade into this discussion. I agree with @Relequestual on all counts.

The current (first) version of the protocol is a best-effort match and present those matches to the user. In this context it makes sense to cap the number of matches, at both ends of the process.

The model for the next version migrates the relevance assessment from the server to the client, in that context it makes perfect sense to allow the client to request as much data as it wants/needs and to paginate through that data in manageable chunks (whatever that means). Though that still does not prevent the server from capping the maximum number of results it is willing to return.

Relequestual commented 8 years ago

In the early days before we implemented a limit, we returned a huge chunk of our database based on some queries. This is not desierable and so a limit was implemented. If someone wishes to access all our data, we have seperate agreements, documents, and procedures for that access. For MME, it makes sense we (DECIPHER) to limit the number of results, report there are potentially more, and allow the other system (and their user) to refine the query to be more specific.

antbro commented 8 years ago

For what its worth...

As you know, I have always argued for putting more 'control' in the hands of the person doing the query. And given the objective of MME it makes no sense to execute a query that return very large numbers of patients. So I tend to agree with Ben on this one.

But surely the API can be designed to 'allow' a server to return an unlimited number of results (in case some server wants to do that) and also include the potential for the server to limit the number returned according to its own policy.

Cheers

Tony


From: Ben Hutton [notifications@github.com] Sent: 16 August 2016 09:13 To: ga4gh/mme-apis Subject: Re: [ga4gh/mme-apis] Result paging/max number of requests (#36)

In the early days before we implemented a limit, we returned a huge chunk of our database based on some queries. This is not desierable and so a limit was implemented. If someone wishes to access all our data, we have seperate agreements, documents, and procedures for that access. For MME, it makes sense we (DECIPHER) to limit the number of results, report there are potentially more, and allow the other system (and their user) to refine the query to be more specific.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ga4gh/mme-apis/issues/36#issuecomment-240033381, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI_EVHqwuCCEjet80ICUtHA_dHPaO0PFks5qgXFGgaJpZM4DU0qc.

buske commented 8 years ago

Sorry, I didn't mean to imply that every server must support returning an arbitrary number of responses, and indeed many servers will likely want to cap the number of results that they return. Other servers will absolutely want to limit the number of responses they get, but we should recognize that the API does not currently constrain the order in which matches are returned. I don't think we should agree to a fixed global limit for the MME though.

This is the proposal I would make:

I suggest we consider:

antbro commented 8 years ago

+1 T

Orion Buske wrote:

Sorry, I didn't mean to imply that every server must support returning an arbitrary number of responses, and indeed many servers will likely want to cap the number of results that they return. Other servers will absolutely want to limit the number of responses they get, but we should recognize that the API does not currently constrain the order in which matches are returned. I don't think we should agree to a fixed global limit for the MME though.

This is the proposal I would make:

* for the current API, we add a |limit| field (name based on the
  common way to paginate APIs using |limit| and |offset| parameters)
* the responding server MUST return at most this number of
  matches, but MAY return fewer

I suggest we consider:

* adding a |count| field (or some other name) to the response to
  indicate the total number of matches that were found
* adopting |limit|/|offset| pagination for exome matching or if
  things move forward with the symmetric data sharing proposal

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-240492026, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_EVAGA6pn20E0JROirVPvag4sa5HQhks5qg0pDgaJpZM4DU0qc.

harindra-a commented 8 years ago

all great points Orion, agreed.

And that is more or less the common practice as well in other production grade APIs, so even better.

Best, Harindra

On Wed, Aug 17, 2016 at 1:55 PM, antbro notifications@github.com wrote:

+1 T

Orion Buske wrote:

Sorry, I didn't mean to imply that every server must support returning an arbitrary number of responses, and indeed many servers will likely want to cap the number of results that they return. Other servers will absolutely want to limit the number of responses they get, but we should recognize that the API does not currently constrain the order in which matches are returned. I don't think we should agree to a fixed global limit for the MME though.

This is the proposal I would make:

  • for the current API, we add a |limit| field (name based on the common way to paginate APIs using |limit| and |offset| parameters)
  • the responding server MUST return at most this number of matches, but MAY return fewer

I suggest we consider:

  • adding a |count| field (or some other name) to the response to indicate the total number of matches that were found
  • adopting |limit|/|offset| pagination for exome matching or if things move forward with the symmetric data sharing proposal

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-240492026, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_ EVAGA6pn20E0JROirVPvag4sa5HQhks5qg0pDgaJpZM4DU0qc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-240492985, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_G8lLZYaHQKvllOEAQyqgUNiaVstks5qg0sGgaJpZM4DU0qc .

Relequestual commented 8 years ago

A note on order of results. JSON is unordered, so it is expected that the client making the request, orders the results based on score.

buske commented 8 years ago

@Relequestual JSON lists are ordered. :)

Relequestual commented 8 years ago

@buske ah good point. A LIST is ordered, just not key/value pairs in an object. My mistake.

Relequestual commented 7 years ago

Revitalising this issue, at the very least we could add a "number of total records that are a potential match" field, which will indicate to the user if their results are beyond 20 or not when 20 are returned. This is informative in terms of knowing that their patient record probably doesn't have enough discriminatory features, the case is common, or the candidate gene could be wrong (Maybe, I don't know in practice).

Relequestual commented 7 years ago

I was thinking for simple queries, it could be possible to include a link to view additional results on the returning website. At least in DECIPHERS case, and probably also myGene2, it's totally possible to search for all patients with a given gene and a set of phenotypes. If others agree they can see the use case, I'll create a new issue. @jxchong thoughts?

fschiettecatte commented 7 years ago

I can see a use for this but I wonder if this rightly belongs to the next major release of the protocol? Another approach would to use some sort of pagination system with a start and a limit which would allow to jump around in a result set (like a search engine). Your approach is better suited if you want to pull all the data in a results set (the Open Archive Initiative uses this approach).

harindra-a commented 7 years ago

I agree with François that a pagination system is a better solution if you want to go this route, I also agree with François that this probably belongs with the next major release where we can design it in better.

Best, Harindra

On Mon, Dec 5, 2016 at 9:21 AM, François Schiettecatte < notifications@github.com> wrote:

I can see a use for this but I wonder if this rightly belongs to the next major release of the protocol? Another approach would to use some sort of pagination system with a start and a limit which would allow to jump around in a result set (like a search engine). Your approach is better suited if you want to pull all the data in a results set (the Open Archive Initiative uses this approach).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/mme-apis/issues/36#issuecomment-264865786, or mute the thread https://github.com/notifications/unsubscribe-auth/AOf2_LcyyFKKOR6rKraVy6BG6PX0pILhks5rFB31gaJpZM4DU0qc .

Relequestual commented 7 years ago

Pagination would be a really hard, possibly impossible sell, for us. We don't want to expose too much data at the same time. It may all be open-access, but that doesn't mean we should make accessing all of it too easy. We have instances where people have tried to scrape our data, which we can then identify and block.