Open knz opened 5 years ago
Hello, my team are using CockroachDB in production and are finding it great so far. Our current use-case for wanting to index array fields is fairly simple. We have a store of personal information where each row has a TEXT[]
type for storing phone numbers.
When we add new people to this table we want to check in advance if one of their phone numbers is already in use by someone else. So we have a query like:
SELECT id FROM person WHERE $1::TEXT = ANY(phone_numbers)
Without indexing it takes around 3 seconds per phone number we search where there are ~450k rows. In some instances, someone could be added who has many phone numbers, or many people who have many phone numbers, which we would individually check, leading to a linear increase in query time.
Perhaps we are approaching this problem incorrectly, but I suspect being able to index this field would yield faster queries.
cc @awoods187 @andy-kimball for triage
Hi Team, We are using it for tagging/labelling purpose inorder to send emails/push notifications for only those whoare tagged with a particular group and in our usecase it's primarily Customer segmentation for sending messages.Now the data has grown large ,we are really in need of an index on the simple array [INT/STRING] asap.
cc @jordanlewis
The two requests in this thread are for inverted indexes on arrays. Filed #43199 to track this.
We use an internal unique id for a user. For transactions in the system can affect multiple users, so each transaction logged may have multiple users affected and we look which ones the transaction is applied to. (Usually less than 10). We have millions of users and billions of transactions to log. Not bring able to index on an array has kept me from trying cockroach.
Arrays are now indexable with inverted indexes.
I ran into this today for VARCHAR and updated the issue at the top to track
Hi,
Arrays are now indexable with inverted indexes, and will be indexable with forward indexes in 20.2.
Is it still planned for 20.2? I'm getting
SQL State : 0A000 Error Code : 0 Message : ERROR: unimplemented: column verified_mobiles is of type varchar[] and thus is not indexable Hint: You have attempted to use a feature that is not yet implemented. See: https://go.crdb.dev/issue-v/35730/v20.2 Location : migrations/postgres/user/V1__user.sql (/file:/app.jar!/migrations/postgres/user/V1__user.sql) Line : 91 Statement : CREATE INDEX IF NOT EXISTS users_verified_mobiles_equals ON users(verified_mobiles)
against cockroachdb/cockroach-unstable:v20.2.0-rc.1
@jordanlewis I think you know most about inverted indexes on arrays, can you take a look at the question above>
@RaduBerinde we do support inverted indexes on arrays.
@jdabrowski correctly points out that we do not support forward indexes on arrays, despite what I said above. We ran into some issues that prevented us from getting them into 20.2 after all.
See: #50662 #50656 #50659 #17154
@jordanlewis thanks for the answer any chances for it to fit into 21 release? If not is it on roadmap for 2021?
We have an issue here with an optimised data storage for placing trades that contain several sub-components.
There need to be two arrays on the object.
The query is made for a market event and corresponding correct outcome.
SELECT from positions p where p.mkt @> [10000] and p.pos @> ARRAY['10000:ln-std:1,10,30']
10000 is the id of the market
ln-std:1,10,30
is the participant's position concerning the market.
@bryanhuntesl you should be able to create two inverted indexes on the two arrays (or only one, for the more restrictive condition).
@bryanhuntesl you should be able to create two inverted indexes on the two arrays (or only one, for the more restrictive condition).
Thanks, @RaduBerinde. I have verified the scenario works with the latest pre-release version (v21.1.0-beta.2).
Executing the following :
CREATE TABLE positions (pos VARCHAR[]);
CREATE INDEX pos_index ON positions USING GIN (pos);
INSERT INTO positions VALUES (ARRAY['ln-std:1,10,30' ]);
EXPLAIN SELECT * FROM positions WHERE pos @> ARRAY['ln-std:1,10,30'];
On Cockroach cloud (free), produces the following results:
"",distribution,local
"",vectorized,false
index join,"",""
│,table,positions@primary
└── scan,"",""
"",missing stats,""
"",table,positions@pos_index
"",spans,"[/ARRAY['ln-std:1,10,30'] - /ARRAY['ln-std:1,10,30']]"
Running locally ...
docker run -ti -p26257:26257 cockroachdb/cockroach-unstable:v21.1.0-beta.2 start-single-node --insecuredocker run -ti -p26257:26257 cockroachdb/cockroach-unstable:v21.1.0-beta.2 start-single-node --insecure
distribution: local
vectorized: true
• index join
│ estimated row count: 0
│ table: positions@primary
│
└── • scan
estimated row count: 0 (11% of the table; stats collected 57 seconds ago)
table: positions@pos_index
spans: 1 span
any chance to support for varchar inverted indexes?
any chance to support for varchar inverted indexes?
I think now you can do some form of this with a computed expression over the varchar field which you can get with either a virtual computed column in 21.1 or an expression based index in the upcoming 21.2.
After #79705 was merged, it's now possible to create trigram indexes on string data that accelerate LIKE
, equality, and similarity queries (compatible with pg_trgm
).
See #41285 if you would like to vote on the remainder of support necessary for full compatibility with pg_trgm
.
Looks like only inverted indexes on VARCHAR is left in this meta issue. We're going to check if there is any code pointing to this issue number, and potentially close this issue.
We still use this issue for telemetry. Moving to backlog.
The following combinations are not supported:
This is a meta issue filed for tracking and telemetry purposes.
To users who are inconvenience: please leave comments with an explanation of your use case.
Jira issue: CRDB-4560