Open rabbull opened 6 days ago
When implementing this operator on the SQLite backend, I gained a clearer understanding of its semantics, especially in the context of nested arrays. It's overly simplistic to treat this operation as a purely mathematical "is-subset" or "contains" comparator. Instead, if we conceptualize JSON objects as trees, this contains
operator effectively checks whether the operand is a connected subgraph of the tree.
For instance, consider the JSON arrays [[1, 2], 3]
and [[2]]
. If we visualize them as trees (as shown below):
[[1, 2], 3]
:[[2]]
:It becomes evident that [[2]]
is a sub-tree of [[1, 2], 3]
. Consequently, the operation [[1, 2], 3] @> [[2]]
returns true
.
In contrast, if you examine [2]
:
[2]
:You’ll notice it lacks an intermediate "ARRAY" node, so [[1, 2], 3]
do not contains
[2]
.
This becomes even more apparent in cases like [[3]]
and other examples, so I would prefer not to elaborate too far.
This tree-based perspective also applies to JSON objects (dictionaries in Python). For instance:
{"data": {"name": "Alice", "gender": "female"}, "time_modified": "1732231289"}
:
{"data": {"gender": "female"}}
:
Here, {"data": {"gender": "female"}}
is a sub-graph of the larger JSON object. Accordingly, PostgreSQL evaluates {"data": {"name": "Alice", "gender": "female"}, "time_modified": "1732231289"} @> {"data": {"gender": "female"}}
as true
.
Additionally, this tree-based interpretation clarifies why the order of elements within an array doesn’t matter: the structure is what’s being compared, not the sequence.
Another noteworthy property of this containment operator is that duplicate elements are ignored. By combining these two characteristics, the behavior of the operator becomes more intuitive and easier to understand.
cc @GeigerJ2
Thanks a lot for this deep-dive and taking the time to write down the explanation here, @rabbull! That actually makes a lot of sense. Feel free to add a pointer to this issue to the discourse thread you linked.
Describe the bug
The
contains
operator on the PostgreSQL backend lacks comprehensive documentation. To better understand its exact behavior, I conducted additional tests in #6617. However, some of the observ /ed behaviors are counterintuitive, and these are detailed below:1. Non-Existent
attr_key
In a test where the
attributes
column does not contain a key namedoops
, the query executes successfully. However, the results are confusing: neither the affirmation nor the negation of thecontains
operation matches the entry. This behavior is unexpected and counterintuitive. What best fits my expectation is to fail loudly, e.g. to raise an Exception.2. Nested Array
This is a known issue discussed in a previous thread on discourse.
When testing with nested arrays, the
contains
operation unexpectedly matches entries even when the contained elements do not strictly align with the expected structure. For example, a query attempting to match[2]
against an array[[1, 2], [3]]
may return a match, even though[2]
is not directly an element of the array.Note that this actually complies with PostgreSQL's native JSONB containment semantics:
However, this behavior contracts with the intution quite a lot, and consequently makes the abstraction hard to be understood well.
In addition, it should also be nice to mention in the documentation that
contains
doesn't care the order of arrays. See example below:Steps to reproduce
See above.
Expected behavior
See above.
Your environment
Other relevant software versions, e.g. Postres & RabbitMQ
Additional context
This might be issues of SQLAlchemy and need further investigation.