ga4gh-beacon / beacon-v2-Models

Models that leverage the Beacon Framework v2
Apache License 2.0
4 stars 7 forks source link

How to handle unsupported query parameters clarification/improvements #80

Closed Tom-Shorter closed 2 years ago

Tom-Shorter commented 2 years ago

When receiving a query it is not clear how the query should be handled in regards to unsupported query parameters.

If a query is received which uses the inidivduals.geographicOrigin parameter then how would we expect a beacon to respond when the beacon/dataset has no data for geographicOrigin?

I think there are 3 main options here:

Personally, I would like to see beacon recommend the use of the third option.

I should make it clear at this point that IMO a query parameter should only be ignored if the dataset as a whole doesn't support the parameter. This shouldn't be used to enable fuzzy searching on the parameter values, i.e. the dataset does support a parameter but a specific data entry doesn't have a value for it.

This could be very powerful when people are looking to set up beacon networks between different organisations. It is highly unlikely that each organisation would have the same data so people will naturally send queries to the network with parameters not supported by all network members. This would likely lead to people using beacon for very simple queries, avoiding any parameters which aren't universally supported to get at least some response back from everyone, and then contacting the data owners for more specific queries. The worst case scenario would be that people either abandon their beacon implementation in favour of a simpler data discovery layer or decide against using beacon in the first case as a simple excel sheet which contains metadata would do the same job.

Users could also create far more complex queries knowing that even though they expect very few data sources, if any, to match all of the query they will still get informative results back from all queried beacons and details about what was/wasn't matched so the user can then make a more informed decision about the best beacons to look into further.

This idea could be expanded further and the user could be given the option of choosing whether they allow "partial query matches" or require "exact query matches" at a query wide label, another step further would be to set this at the parameter level. I don't think either of these options should be looked at for now but they might be interesting in the future. For now having a beacon recommendation for how implementers handle unsupported query parameters is plenty.

jrambla commented 2 years ago

IMHO, we should go for option 1 (returning no results if the query parameters are not matched). As you said, this is more intuitive. If I query for "green" "apple", I assume that there is a reason for that selection and I should only get that in return, or nothing, neither "green things" or "red apples". If the query to the network returns nothing, then I could relax my query and I, as the user, would decide is being an appleis more relevant than being green or the opposite. I don't fully follow your reasoning on Beacon instances going simpler because of that, as I guess that queries should follow the actual user questions. If they start broad and narrow or they do the other way around is a matter of user's choice. Makes any of these sense for you @Tom-Shorter ?

Tom-Shorter commented 2 years ago

Thanks @jrambla, option 1 does make the most sense when interacting with a single beacon for sure and is the easiest to implement.

If the query to the network returns nothing, then I could relax my query and I, as the user, would decide is being an apple is more relevant than being green or the opposite.

This is likely what would happen, option 3 somewhat improves the users experience when relaxing queries however as a) the beacon itself relaxes the query and tells you exactly how it was relaxed and b) the user doesn't need to create a new query, or possibly multiple queries.

My reasoning behind things being simplified overtime follows on from the above, if a user has to consistently simplify queries to return all results they want then they will likely start sending the simplified queries in the first place and then when they have access to the data they will filter it further. Unless all beacons within a beacon network use the same parameters and filtering terms the full power of a beacon query cannot be used within the network, for single beacon instances this shouldn't be a problem though.

I'll close this issue with these comments, whether or not a change from option 1 to something more like option 3 will become clearer as people use Beacon. For now I'm just imagining what is likely a pretty rare situation.