cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.14k stars 3.81k forks source link

sql: support more index-type combinations #35730

Open knz opened 5 years ago

knz commented 5 years ago

The following combinations are not supported:

Type Regular index Inverted index
JSON maybe #35706 already supported
VARCHAR already supported #41285
composite types #120531 (no issue yet)
other types already supported (no issue yet)

This is a meta issue filed for tracking and telemetry purposes.

To users who are inconvenience: please leave comments with an explanation of your use case.

Jira issue: CRDB-4560

davidsbond commented 4 years ago

Hello, my team are using CockroachDB in production and are finding it great so far. Our current use-case for wanting to index array fields is fairly simple. We have a store of personal information where each row has a TEXT[] type for storing phone numbers.

When we add new people to this table we want to check in advance if one of their phone numbers is already in use by someone else. So we have a query like:

SELECT id FROM person WHERE $1::TEXT = ANY(phone_numbers)

Without indexing it takes around 3 seconds per phone number we search where there are ~450k rows. In some instances, someone could be added who has many phone numbers, or many people who have many phone numbers, which we would individually check, leading to a linear increase in query time.

Perhaps we are approaching this problem incorrectly, but I suspect being able to index this field would yield faster queries.

knz commented 4 years ago

cc @awoods187 @andy-kimball for triage

mkharibalaji commented 4 years ago

Hi Team, We are using it for tagging/labelling purpose inorder to send emails/push notifications for only those whoare tagged with a particular group and in our usecase it's primarily Customer segmentation for sending messages.Now the data has grown large ,we are really in need of an index on the simple array [INT/STRING] asap.

andy-kimball commented 4 years ago

cc @jordanlewis

jordanlewis commented 4 years ago

The two requests in this thread are for inverted indexes on arrays. Filed #43199 to track this.

mzimmerman commented 4 years ago

We use an internal unique id for a user. For transactions in the system can affect multiple users, so each transaction logged may have multiple users affected and we look which ones the transaction is applied to. (Usually less than 10). We have millions of users and billions of transactions to log. Not bring able to index on an array has kept me from trying cockroach.

jordanlewis commented 4 years ago

Arrays are now indexable with inverted indexes.

awoods187 commented 4 years ago

I ran into this today for VARCHAR and updated the issue at the top to track

jdabrowski commented 4 years ago

Hi,

Arrays are now indexable with inverted indexes, and will be indexable with forward indexes in 20.2.

Is it still planned for 20.2? I'm getting

SQL State : 0A000 Error Code : 0 Message : ERROR: unimplemented: column verified_mobiles is of type varchar[] and thus is not indexable Hint: You have attempted to use a feature that is not yet implemented. See: https://go.crdb.dev/issue-v/35730/v20.2 Location : migrations/postgres/user/V1__user.sql (/file:/app.jar!/migrations/postgres/user/V1__user.sql) Line : 91 Statement : CREATE INDEX IF NOT EXISTS users_verified_mobiles_equals ON users(verified_mobiles)

against cockroachdb/cockroach-unstable:v20.2.0-rc.1

RaduBerinde commented 4 years ago

@jordanlewis I think you know most about inverted indexes on arrays, can you take a look at the question above>

jordanlewis commented 4 years ago

@RaduBerinde we do support inverted indexes on arrays.

@jdabrowski correctly points out that we do not support forward indexes on arrays, despite what I said above. We ran into some issues that prevented us from getting them into 20.2 after all.

See: #50662 #50656 #50659 #17154

jdabrowski commented 3 years ago

@jordanlewis thanks for the answer any chances for it to fit into 21 release? If not is it on roadmap for 2021?

ghost commented 3 years ago

We have an issue here with an optimised data storage for placing trades that contain several sub-components.

There need to be two arrays on the object.

  1. Contains the list of markets we are matching for outcomes.
  2. Contains the corresponding array of event outcomes (the specific market position taken by the participant).

The query is made for a market event and corresponding correct outcome.

SELECT from positions p where p.mkt @> [10000] and p.pos @> ARRAY['10000:ln-std:1,10,30']

10000 is the id of the market ln-std:1,10,30 is the participant's position concerning the market.

RaduBerinde commented 3 years ago

@bryanhuntesl you should be able to create two inverted indexes on the two arrays (or only one, for the more restrictive condition).

ghost commented 3 years ago

@bryanhuntesl you should be able to create two inverted indexes on the two arrays (or only one, for the more restrictive condition).

Thanks, @RaduBerinde. I have verified the scenario works with the latest pre-release version (v21.1.0-beta.2).

Executing the following : ​

CREATE TABLE positions (pos VARCHAR[]);
CREATE INDEX pos_index ON positions USING GIN (pos);
INSERT INTO positions VALUES (ARRAY['ln-std:1,10,30' ]);
EXPLAIN SELECT * FROM positions WHERE pos  @> ARRAY['ln-std:1,10,30'];

​ On Cockroach cloud (free), produces the following results: ​


"",distribution,local
"",vectorized,false
index join,"",""
 │,table,positions@primary
 └── scan,"",""
"",missing stats,""
"",table,positions@pos_index
"",spans,"[/ARRAY['ln-std:1,10,30'] - /ARRAY['ln-std:1,10,30']]"
​

​ Running locally ... ​

docker run -ti -p26257:26257 cockroachdb/cockroach-unstable:v21.1.0-beta.2 start-single-node --insecuredocker run -ti -p26257:26257 cockroachdb/cockroach-unstable:v21.1.0-beta.2 start-single-node --insecure

distribution: local
vectorized: true
​
• index join
│ estimated row count: 0
│ table: positions@primary
│
└── • scan
      estimated row count: 0 (11% of the table; stats collected 57 seconds ago)
      table: positions@pos_index
      spans: 1 span
z0mb1ek commented 3 years ago

any chance to support for varchar inverted indexes?

ajwerner commented 3 years ago

any chance to support for varchar inverted indexes?

I think now you can do some form of this with a computed expression over the varchar field which you can get with either a virtual computed column in 21.1 or an expression based index in the upcoming 21.2.

jordanlewis commented 2 years ago

After #79705 was merged, it's now possible to create trigram indexes on string data that accelerate LIKE, equality, and similarity queries (compatible with pg_trgm).

See #41285 if you would like to vote on the remainder of support necessary for full compatibility with pg_trgm.

rharding6373 commented 1 year ago

Looks like only inverted indexes on VARCHAR is left in this meta issue. We're going to check if there is any code pointing to this issue number, and potentially close this issue.

rharding6373 commented 1 year ago

We still use this issue for telemetry. Moving to backlog.