Open eisenstatdavid opened 7 years ago
I suggest you to look at DUCET "Default Unicode Collation Element Table" https://en.wikipedia.org/wiki/Unicode_collation_algorithm With this algorithm, you can achieve sensible sort covers all languages. It isn't perfect but acceptable for many. You should do this as case insensitive and accent sensitive manner.
ICU library has this as "root" collation and I guess go has support for ICU in https://github.com/golang/text repository. ICU has many options for example you can collate with german locale, case and accent insensitive and phonebook sort which is special sorting for used on only german phonebooks.
Postgresql has it in version 10 but they didn't support case and accent sensitivity. Main problem with Postgresql was it's used OS libraries for collation handling. This is problematic many ways because collation algorithms updated out of control and indexes became corrupted with changed rules. So they adopt ICU library to be able to version collation algorithm on index. With glibc this is impossible. But they still didn't support case insensitive collations and this is no go for many users uses ORM tools to manage schema and access.
MySQL has this since version 5.5, they call utf8mb4. You can look at http://mysqlserverteam.com/new-collations-in-mysql-8-0-0 for what's coming with 8. They made it default for new db.
Thanks, we already implemented support for collation at the column level via golang/text.
Feature request broken off from #2473 .
Jira issue: CRDB-6061