cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.99k stars 3.79k forks source link

sql: support database-level collations #16618

Open eisenstatdavid opened 7 years ago

eisenstatdavid commented 7 years ago

Feature request broken off from #2473 .

root@:26257/> create database foo lc_collate = de;
pq: unsupported collation: de

Jira issue: CRDB-6061

Enver-Yilmaz commented 7 years ago

I suggest you to look at DUCET "Default Unicode Collation Element Table" https://en.wikipedia.org/wiki/Unicode_collation_algorithm With this algorithm, you can achieve sensible sort covers all languages. It isn't perfect but acceptable for many. You should do this as case insensitive and accent sensitive manner.

ICU library has this as "root" collation and I guess go has support for ICU in https://github.com/golang/text repository. ICU has many options for example you can collate with german locale, case and accent insensitive and phonebook sort which is special sorting for used on only german phonebooks.

Postgresql has it in version 10 but they didn't support case and accent sensitivity. Main problem with Postgresql was it's used OS libraries for collation handling. This is problematic many ways because collation algorithms updated out of control and indexes became corrupted with changed rules. So they adopt ICU library to be able to version collation algorithm on index. With glibc this is impossible. But they still didn't support case insensitive collations and this is no go for many users uses ORM tools to manage schema and access.

MySQL has this since version 5.5, they call utf8mb4. You can look at http://mysqlserverteam.com/new-collations-in-mysql-8-0-0 for what's coming with 8. They made it default for new db.

eisenstatdavid commented 7 years ago

Thanks, we already implemented support for collation at the column level via golang/text.