PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.24k stars 1.02k forks source link

Request to support order-by with COLLATE #3683

Open Murray-LIANG opened 1 month ago

Murray-LIANG commented 1 month ago

Requirement

I have a DB table city with columns - id and name. The name values are input by the end users of my application. So, it would be in different locales depending on the end user's locale.

I want to provide a feature for my application to return the cities sorted by name in the end-user's locale. That is, the application would prepare the request GET /city?order=name.collate.zh_cn and send it to PostgREST service, and PostgREST service would return the cities sorted by name in Chinese.

It would be great if PostgREST could support order=name.collate.zh_cn.

What I've tried

Create a View for city By creating a view city_view and adding a new column name_zh COLLATE "zh_CN" in the view, I can achieve the goal by sending GET /city_view?order=name_zh. However, I have lots of similar tables to support COLLATE sorting. There would be huge changes to creating views and adding columns.

Add computed column name_zh I tried to create a DB function to add a computed column name_zh to city. But it doesn't work.

Any advice would be appreciated.

steve-chavez commented 1 month ago

I see, so you need to be able to do:

SELECT * FROM projects ORDER BY name COLLATE "zh_CN";

And that's not possible with computed columns. The only way are views or functions.

GET /city_view?order=name.collate.zh_cn.

It's not hard to enable it on the syntax but AFAICT doing an order with a different collation invalidates existing indexes (this is why the CREATE INDEX foo_idx ON foo (t COLLATE "zh_CN") way exists) . So this means that clients could force slow queries.

So we would need a way to restrict which collations can be applied. This is related to https://github.com/PostgREST/postgrest/issues/2442 and https://github.com/PostgREST/postgrest/issues/2805.

Murray-LIANG commented 1 month ago

Thanks @steve-chavez for the quick response. Yes, exactly ORDER BY name COLLATE "zh_CN" is what I need.

And that's not possible with computed columns.

I tried this in DB. It works. select id, name_l10n(city.*) AS name_l10n from city order by name_l10n(city.*) COLLATE "zh_CN";

but AFAICT doing an order with a different collation invalidates existing indexes

Yes. I need to create indexes beforehand. The case is that I plan to support like 3 languages. And for those columns that support ordering by locale, I'll create indexes per locale.

So we would need a way to restrict which collations can be applied.

In general, we could support all collations from pg_collation.

wolfgangwalther commented 1 month ago

It's not hard to enable it on the syntax but AFAICT doing an order with a different collation invalidates existing indexes (this is why the CREATE INDEX foo_idx ON foo (t COLLATE "zh_CN") way exists) . So this means that clients could force slow queries.

We are using this argument ("potentially forcing slow queries") in a lot of cases. Until now all ideas to improve the situation were along the lines of either estimating the cost associated to the query and have a threshold, or to be able to block specific operations explicitly via config.

How about we try to turn this around and enable additional operations only when a certain index is present? In the end, this might not be practical to do, especially with views involved, and we'll end up with looking at the cost estimation for a query again, but I think we should try to change our perspective here a bit.

steve-chavez commented 1 month ago

Yeah, I agree. If the index is present then we could enable the order collate.