cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30k stars 3.79k forks source link

Support for creating index with text_pattern_ops #57285

Open ebner opened 3 years ago

ebner commented 3 years ago

We are using text pattern indexes in jsonb like this:

CREATE INDEX ON data_29cca9f2 ((data->>'json123') text_pattern_ops)

I already found #9682 that tracks support for expression-based index columns, but I failed to find an issue for support of text_pattern_ops. Are there any plans for supporting any of the *_pattern_ops available in PostgreSQL?

Jira issue: CRDB-2839

blathers-crl[bot] commented 3 years ago

Hello, I am Blathers. I am here to help you get the issue triaged.

It looks like you have not filled out the issue in the format of any of our templates. To best assist you, we advise you to use one of these templates.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

RaduBerinde commented 3 years ago

Can you explain why you need text_pattern_ops and what doesn't work if you don't use it? I may be wrong, but I think CRDB's indexes on string work character by character, so you might not need it. CC @jordanlewis who might know more.

ebner commented 3 years ago

We use text_pattern_ops to index arbitrary text values of JSON objects (jsonb type) in order to improve query performance (regex-based queries may happen). In fact, the lack of text_pattern_ops is only one showstopper, we also need the index to be expression-based which is another showstopper. We'd have to work around #9682 by using stored generated columns for every JSON property.

jordanlewis commented 3 years ago

@ebner, would you be able to offer an example query that is only satisfiable with text_pattern_ops? It would help us understand your use case better.

Expression-based indexes are likely to be released in our next major release, this Spring.

ebner commented 3 years ago

The queries are not different from any queries against an index without opclass and our application would not break if we removed the text_pattern_ops, but we want to achieve the best possible performance with our queries. An example query using a regex (data is of type jsonb) is as follows:

SELECT data FROM data_25275cd3 WHERE data->>'prop1' ~ '^För.*'

We chose the opclass because of the statement "The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard “C” locale." in the PostgreSQL documentation on Operator Classes. We do not use the standard C locale and we use regular expressions.

Glad to hear about the upcoming expression-based indexes!

jordanlewis commented 3 years ago

Got it. We will support this kind of query with and without text_pattern_ops once #50345 is finished. Today, we still don't support regex or like matches on inverted indexes, though our index structure supports them.

ebner commented 3 years ago

Great, thanks for the pointer to #50345!

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!