Open nadove-ucsc opened 4 years ago
This behavior would seem to be due to each range-based filter being split into multiple distinct filters, one for each provided range. And separate filters are (intentionally) combined using conjunction. https://github.com/DataBiosphere/azul/blob/2bc22c62f39067524870ae37a14d8f8b1891c907/src/azul/service/elasticsearch_service.py#L128
Let D be a range in the document and R₁ and R₂ be ranges in the filter. A conjunction can expressed by combining the filter ranges:
D ⊆ R₁ ∧ D ⊆ R₂ ⟺ D ⊆ R₁ ∩ R₂
Therefore, within
should be disjunctive.
I'm unsure about contains
and intersects
.
Exactly the same logic applies to contains
, just swapping subset for superset and intersection for union. This is most easily demonstrated visually using a number line:
Obviously, the document range (in red) contains all of the filter ranges (blue) iff it contains the lowest of the lower filter bounds and the highest of the upper filter bounds.
And a similar argument applies for intersects as well: D intersects [a, b], [c, d], and [e,f] iff it intersects [b, e] (taking the lowest of the upper bounds and the highest of the lower bounds).
So I believe they should all be disjunctive.
The
contains
,intersects
, andwithin
filtering operators currently combine multiple ranges using logical conjunction, whereas theis
operator uses disjunction, which would appear more useful and intuitive.This means that, for example, the filter
Will return nothing, since the two ranges do not overlap.
Hannes: "And I think this [use of conjunction] is a bug. For "within" this really should be "or" logic. For "intersects" and "contains" I'm not sure."