dapr / components-contrib

Community driven, reusable components for distributed apps
Apache License 2.0
534 stars 466 forks source link

Postgres State Store - Support GIN indexes in Query API #2524

Open joshuadmatthews opened 1 year ago

joshuadmatthews commented 1 year ago

Describe the feature

Certain databases like CosmosDB automatically index every key of a document, and benefit from improved query times on larger tables because of it

It would be nice if we could take advantage of the similar GIN index in Postgres.

From what I can tell, it's just a matter of exposing a toggle and a slightly different translateFieldToFilter function.

The current syntax for queries generated something like

SELECT * FROM public.state WHERE value->>'somefield' = 'somevalue'

Taking advantage of a GIN index is as simple as

SELECT * FROM public.state WHERE value @> '{"somefield": "somevalue"}';

I can contribute to this, but looking for some feedback on whether we have already considered something like this, and if no is it a good idea? Seems like a simple enough change with a big enough payday to be worthwhile. In some local testing I saw upwards of 7000x better performance on the above queries and a table with 10M record.

Release Note

Enable it on the State Store component, it would be nice if we could create the GIN index as well while creating the table if the setting is enabled.

RELEASE NOTE:

ItalyPaleAle commented 1 year ago

I can totally see the value of using GIN indexes, but I am not sure we want to enable this by default. Two reasons:

  1. The Query APIs in the state store are going to be revamped and moved to a separate building block. The current state store building block is very generic and needs to support a lot of use cases.
  2. An index on the value is not free and it does cost a lot in terms of memory, storage, and write speed. Precisely because this building block is generic, adding an index by default would be a "tax" on everyone using the component.

You're obviously more than welcome to add the index to your own tables. At the end of the day, they're just tables in Postgres and you can add your indexes as well. In fact, in Dapr 1.10 we are adding support for TTLs in the Postgres state store, but not adding an index by default, and instead we encourage users to add it manually if they need it.

joshuadmatthews commented 1 year ago

To clarify I wasn’t suggesting changing the default behavior, I am proposing a setting on the state store component to allow opting into GIN indexing.

Setting a GIN index manually would be a fruitless effort because, as I pointed out, the query format of the where condition needs to change to take advantage of containment queries to leverage the index properly on a jsonb column. See https://pganalyze.com/blog/gin-index

berndverst commented 1 year ago

To clarify I wasn’t suggesting changing the default behavior, I am proposing a setting on the state store component to allow opting into GIN indexing.

Setting a GIN index manually would be a fruitless effort because, as I pointed out, the query format of the where condition needs to change to take advantage of containment queries to leverage the index properly on a jsonb column. See https://pganalyze.com/blog/gin-index

PRs are welcome but keep in mind that the Query API in State Store will not be made Stable.

Once the Document Store Building Block proposal is completed (no ETA right now) the deprecation of Query API in State Store will be announced. Eventually there may be a Document Store implementation for Postgres - but this is not interchangeable with the State Store, and not intended to be able to read/write the same keys (across state store and document store implementations)

joshuadmatthews commented 1 year ago

@berndverst I understand, this isn't something that needs to be rushed, maybe just something we could include in the DocumentStore Postgres component whenever it comes along. The syntax for both types of where condition are so similar it seems like a toggle would be relatively easy to implement.

The performance benefits are substantial, and because of the fastupdate mechanism of postgres GIN index there is very little effect on write speed until you get to really high throughput write tables where the fastupdate mechanism can't keep up.