requestParameters.json for each entryType

Tom-Shorter commented 2 years ago

GenomicVariants has a requestParameters.json file which (as far as I know) describes all possible request parameters which can be used and there is a flat hierarchy for the request, e.g. start is a direct child of the query but in the response it is moved to be a child of position.

Can files such as these be created for each of the models?

Whilst implementing the dataset model I was left wondering "how do I search for more than one dataset, but not all" and I can't see a solution to this within the current spec.

if I've missed something which does resolve this then please let me know.

Sorry for not bringing this up sooner, I've just realised I was using an old workaround which is basically me creating a "datasetIDs" request parameter!

jrambla commented 2 years ago

The requestParameters included in the model are the ones that seemed "minimal" to us. I would not consider any of them mandatory as they depend on the Beacon instance to support that parameter or not. I would not say they are not comprehensive but extensible. For the dataset Ids we had a solution in Beacon v1, but that approach doesn't makes sense in Beacon v2, as "dataset" is an entry type (as a Collection), but not defined in the Framework as something static. I need to refresh myself on if we eventually had found a solution (maybe only implemented in the very basic example model). One option would be to add the DatasetId and Cohort Ids to all entry types' requestParameters.json. Makes sense?

mbaudis commented 2 years ago

So there could be a requestParameters schema, with datasetIds, cohortIds, filters, "additionalParameters": True and endpoints should be required to have an instance of a compatible document?

jrambla commented 2 years ago

The requestParameters.json is an informative document for the Beacon client to know which parameters are available in a given Beacon instance. So every entry type in every Beacon instance should have a requestParameters.json associated to it (or not if no parameters are supported).

mbaudis commented 2 years ago

Agree; but is there a schema to check the file format? Or is it documented that this is mandatory?

Tom-Shorter commented 2 years ago

The requestParameters included in the model are the ones that seemed "minimal" to us. I would not consider any of them mandatory as they depend on the Beacon instance to support that parameter or not. I would not say they are not comprehensive but extensible. For the dataset Ids we had a solution in Beacon v1, but that approach doesn't makes sense in Beacon v2, as "dataset" is an entry type (as a Collection), but not defined in the Framework as something static. I need to refresh myself on if we eventually had found a solution (maybe only implemented in the very basic example model). One option would be to add the DatasetId and Cohort Ids to all entry types' requestParameters.json. Makes sense?

I'm not sure this would be a reasonable solution, if I wanted to query a beacon with 4 datasets but only query 2 of these then there's no current method to do that. Another example is for ageOfOnset in the individuals model, if I want to search for individuals with any disease/condition which started after the age of 40 then I'm not sure how to complete that query currently as ageOfOnset is nested within the diseases section. I could leave the fields for disease empty but then thediseaseCode would be blank("") so how would a beacon handle this? do they assume that blank means "all" or must match "" exactly, it;s the same kind of question around empty queries.

For me the requestParameters.json file allows to define queries which fall inline with the data contained within the model response but in a format which allows for flexible queries, i.e. flat structure, ranges instead of exact values and possible use of operators, text value like/exact the users input value.

mbaudis commented 2 years ago

The issue with the datasets had been brought up before, with respect to the id paths (e.g. biosamples/{id}/variants) where you have to loop over all datasets to match.

I'm +1 for having a set of standard parameters (e.g. datasetIds ...) specified. For the more elaborate use case (ageOfOnset ... there is the current concept of having this specified through filters which themselves are exposed through the filteringTerms endpoint. IMO for 2.n versions of Beacon we may go the way of providing query terms/prototypes for parameters matching the default data model.

mbaudis commented 2 years ago

Bumping this up. All entities in the default model should have requestParameters (and corresponding endpoints).

jrambla commented 2 years ago

As explained above, originally, the Model was not defined to be too prescriptive but giving flexibility to allow implementers or instances to declare which entry types are included in such Beacon instance. So one Beacon instance could implement only genomicVariations and another only individuals w/o any genomic information attached to it. One consequence of this is that endpoints like /individuals/{id}/g_variants should not be mandatory to be Beacon v2 Model compliant. A similar consideration applies to relationships (~foreing keys) in the schemas, as the other end of the relationship could be not implemented. E.g. a Beacon with genomicVariants only would not have runs, analysis, biosamples or individuals, hence, such relationships should not be mandatory either. Again, this applies to request parameters, that should not be mandatory for stuff that could not be present in a given instance. Initially, I'd applied that principle to datasets and cohorts too. Meaning that we cannot enforce individuals to have a mandatory cohort_ids as cohorts could not be part of that Beacon instance. In a extreme case, a Beacon could have not implement datasets. Some Beacon implementers don't understand why they should define a dataset when the information is "already" in Beacon info or in the name of the Beacon... This point is debatable, but also has a point in the sake of simplicity, which includes not adding artificially elements that are not required for a working solution.

So, if you can't have mandatory cohort_ids or dataset_ids, which solutions are still there:

Add them but don't make it mandatory
Leverage REST and rely on endpoints like datasets/{id}/individuals?filters=... to get the entry types associated with a given dataset.

Option 1 requires that Beacon instances are declarative on if they support datasets and Cohorts entry types, and the client to poll that before assuming they are there. Option 2 implies the client posting one independent query for every dataset it is interested in. In Tom's example, doing one query for dataset1 and another for dataset2

Both of them are valid and complementary, as a Beacon instance can implement datasets and the dataset/{id}/whatever endpoints and not implement requestParameters for picking the datasets.

So, at design time, given that the described flexibility is pervasive in the Framework and the Model, and does not only affect collection entry types, the solution just above seemed a good compromise to me, something that offers the best of both worlds.

jrambla commented 2 years ago

For the question of AgeOfOnset, the current approach is:

If only AgeOfOnset is declared that states any disease that individual could have
If both AgeOfOnset and Disease are declared, given that all filters are connected by AND boolean operator, the query must return individuals associated with the requested disease which onset happened at that age

mbaudis commented 2 years ago

One consequence of this is that endpoints like /individuals/{id}/g_variants should not be mandatory to be Beacon v2 Model compliant.

But AFAIK the Beacon default models should be extensive (pre-defining anything that "makes sense"), with each implementation then subtracting from it, as long as it is compatible w/ the framework - i.e. your option 1.

And it may then be good to add the template model, where only necessary/dummy entities & schemas exist.

And back to the original topic: A requestParameters document should be there for each entry type, with a minimum definition of id (and filterIds / filters).

jrambla commented 2 years ago

If we take the route of a "minimally comprehensive" model description, I'm fine with that. I didn't want to "impose" an extensive model to the implementers, but happy to embrace it if they are willing of that. I've did a TEMPLATE Model some months ago exactly with that purpose, that can start a new Model from scratch or that you can start minimal and add as much as you want to implement the desired part of the Model v2.

mbaudis commented 2 years ago

@jrambla O.k. - see #112 And It would be great if you (or @MrRobb?) could then at some point revive the template model (I'd suggest doing this in this repo, parallel to the BEACON-V2-Model directory or after move to the unified one).

Tom-Shorter commented 2 years ago

Meaning that we cannot enforce individuals to have a mandatory cohort_ids as cohorts could not be part of that Beacon instance.

I completely agree with this but I don't think this is an actual issue. If you want to find individuals which match certain parameters found within the individuals model, which are also part of cohorts A, B or C then you would form a query that contains both the individual model and the cohort model and send it to /individuals. The individual model does not need to contain the cohort_ids parameter as the cohort model will cover that.

mbaudis commented 2 years ago

Closing here since #112.

ga4gh-beacon / beacon-v2-Models

requestParameters.json for each entryType #50