Open alamb opened 7 months ago
cc @gruuya as I believe you mentioned you might also be interested in this feature
I think a first step would be to identify a mature Rust library for supporting collations, as I suspect this is not something we wish to implement ourselves, much like we use chrono for temporal functionality.
I also wonder if there might be a middle ground where we provide specialised UDFs or similar for manipulating collations, as full native support would be a very substantial undertaking. This would also provide a good story for making this functionality optional
I believe collation
is an important feature for postgres compatibility, but it is not as widely used in other databases.
I agree having something that was optional would be ideal
Yeah, for reference DuckDB also provides optional general collations, using an extension for the ICU project: https://duckdb.org/docs/sql/expressions/collations.html#icu-collations
There's a corresponding Rust crate as well: https://docs.rs/icu/latest/icu/collator/index.html
Is your feature request related to a problem or challenge?
"Collation" generically means how to compare and sort string values.
Soem databases, most notably Postgres, allow you to change the default collation order to control this more carefully to match whatever the user wants rather than what the standard sort order means
Someone asked about this on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1205554368292855868
Here are some details on how this works in Postgres: https://www.postgresql.org/docs/current/collation.html
Describe the solution you'd like
Someone to design and implement
COLLATION
This probably looks like a
SessionConfig
setting to control collation at the session level and possibly some way to define it as part of the table definitionDescribe alternatives you've considered
No response
Additional context
This would likely require adding collation support to arrow-rs as well, though I am not 100% sure