ga4gh-discovery / ga4gh-case-discovery

A framework for searching genomic data sharing services
Apache License 2.0
8 stars 5 forks source link

Format restriction in the`queryID` #18

Closed fschiettecatte closed 6 years ago

fschiettecatte commented 6 years ago

Do we want to constrain the format of the query identifier (length, characters allowed, integer, uuid)?

The reason this is a question is that it was an issue in MME v1, some used an integer, others a mix of characters and integer.

Relequestual commented 6 years ago

I would say, String or Number is fine.

fschiettecatte commented 6 years ago

With a length limit? For example GeneMatcher uses integers, whereas mygene2 uses urls which can get quite long. I need to store other people's identifiers in a table to track prior matches and I expect that others will too. It would be good to know up front what to expect so as not to break size limits on tables.

Relequestual commented 6 years ago

I don't see any need to define a limit, but if we must, VARCHAR(65535) and TEXT are basically the same for MySQL, and is better in most cases for PostgreSQL.

Do you invisage doing any queries using this ID? 65535 is plenty for URLs, even for http://about.bitty.site =]

fschiettecatte commented 6 years ago

I don't envisage doing any queries with these IDs, but I do need to store them so that I can track prior matches so as not to re-notify people when a new search comes in.

Relequestual commented 6 years ago

When MyGene2 used URLs, that was to represent the case being used. It looks like this ID is for the "query event" should referencing it be required. It sounds like you're refurring to an ID which represents the case. Is this right? If so, I think it should be a different component.

fschiettecatte commented 6 years ago

Maybe, I would like it is to be explicit for the ID do that there are no nasty surprises down the road.

Relequestual commented 6 years ago

OK, so it sounds like you want this ID to represent "the specific query made by the client to the server".

Should this ID be the same if the request is repeated by the client?

When you say "I do need to store them so that I can track prior matches so as not to re-notify people when a new search comes in.", could you expand on what this workflow looks like please? I'm not sure I fully understand the workflow, and I want to make sure we're properly capturing your requirements.

fschiettecatte commented 6 years ago

I am not sure that expanding the workflow here is actually relevant to MME, this is internal GeneMatcher operations, separate from the protocol.

Relequestual commented 6 years ago

Well, if I can't understand how you intend to use this ID, as in, I can't understand the user story, then I can't properly descirbe its purpose. Currently I find your explanation of what you want to do with the ID, and what your expectations are of the ID, ambigious. I thought it would be easier for you to explain the process you intend to rely on it for, rather than for me to guess and list out the possible ambiguities.

fschiettecatte commented 6 years ago

How GeneMatcher uses the ID is orthogonal to how the ID is specified. All I am asking is that we specify what this ID looks like so that I can store it adequately. Specification is independent of implementation.

Relequestual commented 6 years ago

Yes and no. Currently, as the spec stands, the queryID should have no meaning or use to you, as it's generated by the client, for the clients reference. The client may create it at random, relate it to the content of the query, or relate it to the specific query, or any combination of things.

If you're intending it to have any specific meaning to the server, then the client needs to know that, and it needs to be specified.

In your earlier comment, you implied it had meaning to the server (you), and you intended to do or not do something based on its value.

Relequestual commented 6 years ago

Given there's no response on this, I'll write up the details as I understand them in the field defintion. If there are any objections, those can be raised when that work is done.

Relequestual commented 6 years ago

Attempt to resolve with https://github.com/ga4gh-discovery/ga4gh-discovery-search/pull/30/commits/854e4c12eab2beeb7f90717a00c5c8563be15f20

ID must a be a string of max 2000 characters.

Defined as:

This component contains a unique identifier for the query provided by the client. This ID may or may not be time bound, and may be only related to the JSON payload or combination of other information the only the client knows, for example the user creating the query.

2000 characters was chosen as is often taken as a reasonable length limit for a URL given various browsers support slightly more than that.

I don't see any reason for the response to include the request ID, given this is an HTTP request and response process, so have not included that.

Anything I've missed or can we close this issue assuming the above mentioned commit gets merged?