e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 32 forks source link

Fix local_date rollover queries #181

Closed shankari closed 5 years ago

shankari commented 8 years ago

Back in 6753c223c7084f88c9bebdbc6aa9b9e5a8a6ec7a, we implemented the Localdate date data structure and queries on it. The queries allow users to fill in fields of a start and end local dt, and we look for entries that are between the start and the end date.

This works well for standard queries e.g. 2015-08-01T08:18 -> 2015-08-02T08:20

but it does not work well for rollover queries e.g. 2015-08-01T08:18 -> 2015-08-02T09:08 because when we query for entries where 18 <= minute <= 8, we don't find any

One option is to convert them to ranges instead, but also that the queries support wildcards, which makes converting them to ranges difficult. So for example, all of the following are valid:

Similarly, the following rollover queries are also valid and must be supported:

shankari commented 8 years ago

So the mapping for standard queries is start_val <= field <= end_val so the queries for the examples above will be:

For rollover queries, we need to query across two disjoint sets, so start_val <= field <= range_max | range_min <= field <= end_val So this would become:

Note that all the components except the year can roll over. Years can never roll over - they are monotonically increasing.

We can also use negation of the non-matched set not end_val < field < start_val So this would become:

shankari commented 8 years ago

The formulation with the negation exposes an important issue with the standard queries as well. They are only applicable for the ends of the range - the middle of the range must not be filtered.

To pick some concrete examples:

This will not be matched even by the standard query since the queries will be:

This will only match 5 days (10-15) in Feb, Mar, Apr, May, even in the year 2015

shankari commented 8 years ago

wrt the previous comment, it is actually unclear whether the middle of the range should be filtered or not. Basically, if I see something like 2014-02-10 -> 2016-05-15, it can be interpreted either as a set of filters on the various year(s) (year = 2014-2015, month = 2-5, day = 10-15) or as a range (10 Feb 2014 to 15 May 2016).

The notion of treating it as filters instead of a range is particularly attractive/important from the aggregate urban planning perspective so that I can look at changes in particular time slices over longer time frames.

The real problem is that we do not have a clear separation between range-based and filter-based queries. Years can only really be range-based, but the other components can be either range-based or filter based as we saw above.

Let's define that the semantics favor filtering since that makes our life easier. We should fix the filtering for rollover as described earlier in this issue.

Later, we can switch to a better design that allows flexibility in choosing range or filter queries

shankari commented 8 years ago

This focus on filtering has some weird implications 2015-08-01T08:18 -> 2015-08-02T09:08 will match not just {'hour': 8, 'minute': 18} but also {'hour': 9, 'minute': 57}

shankari commented 8 years ago

but for the currently most broken case of the change of month, when the default search is 2016-03-31 to 2016-04-01 we will be fine because in general 2016-04-31 will not have occurred at the time of the query

shankari commented 8 years ago

query is:

DEBUG:root:In get_range_query
 returning query {'data.local_dt.minute': {'$not': {'$gt': 8, '$lt': 18}}
 'data.local_dt.hour': {'$lte': 9, '$gte': 8}
 'data.local_dt.year': {'$lte': 2015, '$gte': 2015}
 'data.local_dt.month': {'$lte': 8, '$gte': 8}}
shankari commented 7 years ago

It is not clear that there are any bugs left. We could enhance/change the way the filters actually work, but that's an enhancement, not a bug. Closing this issue for now.