Open danhhz opened 6 years ago
Assigning to @awoods187 for prioritization
@RaduBerinde is this related to the computed index ideas we've been discussing?
@RaduBerinde is this related to the computed index ideas we've been discussing?
I think so, we could have an index on a computed value that extracts the json field.
A similar use case is for prefix matches on inverted index string columns:
create table a (id int primary key, s string[], inverted index(s));
insert into a values(1, array['big', 'indexed', 'sentence']);
Today, you can do exact matches:
select * from a where s @> array['blah']
With some way to do range scans, you could do something like (syntax doesn't work, but this is the intent):
select * from a where any(s) like 'prefix%'
We also got bitten by this, a filter over an JSON column with an inverted index uses it, but if you add another field to the game, a PK for example, it actually ignores the inverted index.
Thanks for raising this @danhhz
DROP table foo;
CREATE TABLE foo (A INT PRIMARY KEY, B jsonb, C VARCHAR);
INSERT INTO foo (A, B, C) SELECT generate_series(1,100) AS A, '{"values": ["foo", "bar", "baz"]}' AS B, md5(random()::text) AS C;
CREATE INVERTED INDEX foo_inv ON foo(B);
CREATE INDEX foo_idx ON foo(C);
EXPLAIN SELECT * FROM foo WHERE B @> '{"values": ["baz"]}';
Looks good:
[
{
"tree": "",
"field": "distributed",
"description": "false"
},
{
"tree": "",
"field": "vectorized",
"description": "false"
},
{
"tree": "index-join",
"field": "",
"description": ""
},
{
"tree": " │",
"field": "table",
"description": "foo@primary"
},
{
"tree": " │",
"field": "key columns",
"description": "a"
},
{
"tree": " └── scan",
"field": "",
"description": ""
},
{
"tree": "",
"field": "table",
"description": "foo@foo_inv"
},
{
"tree": "",
"field": "spans",
"description": "/\"values\"/Arr/\"baz\"-/\"values\"/Arr/\"baz\"/PrefixEnd"
}
]
But:
EXPLAIN SELECT * FROM foo WHERE B @> '{"values": ["baz"]}' AND C = 'someAutogenID';
Doesn't:
[
{
"tree": "",
"field": "distributed",
"description": "false"
},
{
"tree": "",
"field": "vectorized",
"description": "false"
},
{
"tree": "filter",
"field": "",
"description": ""
},
{
"tree": " │",
"field": "filter",
"description": "b @> '{\"values\": [\"baz\"]}'"
},
{
"tree": " └── index-join",
"field": "",
"description": ""
},
{
"tree": " │",
"field": "table",
"description": "foo@primary"
},
{
"tree": " │",
"field": "key columns",
"description": "a"
},
{
"tree": " └── scan",
"field": "",
"description": ""
},
{
"tree": "",
"field": "table",
"description": "foo@foo_idx"
},
{
"tree": "",
"field": "spans",
"description": "/\"78455d02293f0f16ab5e519c244a70dc\"-/\"78455d02293f0f16ab5e519c244a70dc\"/PrefixEnd"
}
]
@lopezator - in the second case, it's much better to use the primary index since we scan at most one row (for a=3). Using the inverted index would be worse in most cases.
@RaduBerinde you are right, bad example. I've update the example above to be more clear.
CC @mgartner @rytaft
@lopezator Your updated example is scanning foo_idx
, which is the better query plan assuming that the filters on C
is more selective than the filter on B
. Currently, stats for JSON columns are not as precise as stats for other data types, so it's possible that a scan on foo_idx
would be preferred even if the filter on B
was more selective.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
a comment will keep it active
From the forum: https://forum.cockroachlabs.com/t/multi-tenant-custom-fields-saas-app/1565/2
So, let’s say all tenants’ products are in one table and there’s foreign key tenant_id. Then we have a json field, custom_data. A tenant might have a custom field price. Then the tenant wants to search all his products where price > 100. An index on the foreign key, possibly compound with other “static” fields will speed up the query. But an index on json field will not be useful in this case, right?
Jira issue: CRDB-5739