Closed d-callan closed 1 year ago
This is an interesting problem. My initial thought is for the API to return a list of ranges for which data is present. The problem with this is that the response size could be proportional to the range of the variable values. This would only be possible if, for example, every other time unit has data and we return each individual data point as a discrete range. Likely, the desired behavior in this scenario would be to return the entire range.
This suggests the need for some kind of threshold to indicate the maximum time gap before breaking up an interval. We could set this threshold based on the range of the temporal variable values to ensure we don't have an unreasonable number of intervals in our response (e.g. if the study is over 5 years, only break up an interval if 5 days are missing).
Have we decided the granularity of ranges for the time picker? Is it ever going to be more granular than days? I'm guessing not, because I'm not aware of any studies that have datetimes.
another possible complication is that presence or absence of data depends in part on how the map markers are configured, and configuration can vary per map type. ill put some thought to these things and hopefully comment again later.
i was thinking about this some more, and its possible the concern about different map types having different configuration can be handled using filters? like presumably the things that impact whether there is data visible on the map are always: 1) marker variable selections 2) values selected for those variables and 3) the viewport.
im also currently inclined for dates to maybe assume day granularity? so the response would literally just list each day and a value of 0 or 1.
As you suggested, it seems like the distributions endpoint could be pretty serviceable for your requirements listed above. If we wanted to save some data transmission, we could add an option to omit the zero values. I agree that all the marker types should have enough in common to provide:
Here's an example example request to distributions:
{
"valueSpec": "count",
"filters": [
{
"entityId": "GAZ_00000448",
"type": "longitudeRange",
"variableId": "OBI_0001621",
"left": -88.681640625,
"right": -70.30664062
}
],
"binSpec": {
"displayRangeMin": "2020-05-19T00:00:00Z",
"displayRangeMax": "2022-12-17T00:00:00Z",
"binWidth": 1,
"binUnits": "day"
}
}
nice. then if the markers were configured w varX, values 1,2,3 those could be passed as part of the filters as well. so the question is to the frontend: @bobular @moontrip , are you happy to use the distributions endpoint (passing the viewport and marker vars as filters) and collapse any count > 1 to 1?
Yes, I think this would work, thanks. I think we might add a client-side heuristic to bin at week or month as required. We have some 40+ year datasets.
@d-callan @dmgaldi
Oh, but we wouldn't know what the full date range was until we had already made the request?
ranges should be in the study metadata for that variable. that, plus any filters applied to that variable, should tell you what range to show for the xaxis of the ez time slider.
im going to close this. we can reopen if it turns out this plan doesnt work out as intended for some reason.
we are introducing an ez time filter, where the user can choose a temporal variable and define a range (or maybe ranges) to filter over without having to enter the menu. for that we are introducing a new component (see https://github.com/VEuPathDB/web-monorepo/issues/245) which among other things needs to indicate for which values that temporal variable has data visible on the map.
this endpoint needs to know which variable is the temporal variable of primary interest, as well as how the map markers are configured, in order to do this. exact api tbd.