Tecnativa / doodba-scaffolding

Officially supported scaffolding template for your Doodba projects
Other
11 stars 38 forks source link

UTF-8 sorting not working correctly with used postgres image #54

Closed ap-wtioit closed 5 years ago

ap-wtioit commented 5 years ago

I'm not sure how we should treat this. I think i should at least be a known issue. Maybe we should switch the used postgres image.

When using the default doodba postgres alpine image, sorting does not correctly work for UTF-8.

bigbear3001@wt-io-it-bigbear3001:~$ docker run --rm --name=pg_test_alpine -d postgres:10-alpine
0edf0f7ee9b25c051fb028c13d73fd6d5d4822d81afc88ce54d2f0d0230a3c42
bigbear3001@wt-io-it-bigbear3001:~$ docker exec pg_test_alpine psql -U postgres -c "SELECT unnest(array['a','ä','b']) ORDER BY 1"
 unnest 
--------
 a
 b
 ä
(3 rows)

bigbear3001@wt-io-it-bigbear3001:~$ docker stop pg_test_alpine
pg_test_alpine

vs the non alpine image

bigbear3001@wt-io-it-bigbear3001:~$ docker run --rm --name=pg_test -d postgres:10
0144a66be1c49ebf23612f287bbebeeadf6318566e530d695194237853c644a3
bigbear3001@wt-io-it-bigbear3001:~$ docker exec pg_test psql -U postgres -c "SELECT unnest(array['a','ä','b']) ORDER BY 1"
 unnest 
--------
 a
 ä
 b
(3 rows)

bigbear3001@wt-io-it-bigbear3001:~$ docker stop pg_test
pg_test

Only in the non alpine version the sorting of characters is working with what i think is called icu.

Should we switch to the non alpine version as default?

Test code is from https://github.com/PostgresApp/PostgresApp/issues/216#issuecomment-64157627

yajo commented 5 years ago

Try this query:

SELECT foo FROM (SELECT unnest(array['a','ä','b']) AS foo) AS bar
ORDER BY foo COLLATE "es-ES-x-icu";

It should work just fine.

You probably discovered an error in Odoo itself, which doesn't add the COLLATE sentence. It should add it depending on the current user in the system, because depending on his current settings he'd expect some collation or another. See this from runbot v13:

2019-11-15_08-53

ap-wtioit commented 5 years ago

Well i tested the same doodba instances with the same odoo installation script, and different databases (postgres 10 from debian vs postgres 10 from alpine). And the debian postgres image behaves like our prod databases (which we are also running on debian ;-) ) so we are gonna stick with the non alpine version.

I think the current argument for this on Odoo is that they are using the sorting setup in the DB (which should be en_US.utf8 (but that's not working)). And the argument on Postgres seems to be that they are using the sorting of the OS (which should be en_US.utf8 (but that's not working on 10-alpine)).

Maybe my sample is bad (and a bit austrian focused). The Sorting is also not working for HELLO vs broom (other product names used on the customers instance) essentially the sorting available on the 10-alpine image is a very bad one that just uses the codepoint of the characters so every uppercase character comes before a lowercase character.

As i wrote i don't know how to handle it i think the arguments of Odoo and Postgres are understandable. But your argument implementing it in Odoo for the users setting would habe the benefit that the sorting could be different for Users from Finnland and from Austria (the ä comes in a different place in the two languages).

So i guess its fine to close the issue here as well, i just thought maybe we could document in case somebody else hits the same problem.

yajo commented 5 years ago

Indeed it's weird that nobody noticed this before, it's a crazy behavior. I reported it in https://github.com/odoo/odoo/issues/40318. Feel free to subscribe there and let's see if we can have a definitive fix on the matter.

Thank you for the report.

yajo commented 5 years ago

@ap-wtioit Just in case you missed it, there you have the real fix to your problem: https://github.com/odoo/odoo/issues/40318#issuecomment-555534007

ap-wtioit commented 5 years ago

for me it looks like the solution then is:

See also 276ea81 and the full discussion at #25196 using C by default and the commit message:

Admins who want to apply a special collation can still do so by creating the database manually, instead of letting the system do it.

which the non alpine postgres image does as default for us, and the alpine one doesn't. Output from the non alpine image:

db_1_5a78890843e2 | 
db_1_5a78890843e2 | The database cluster will be initialized with locale "en_US.utf8".
db_1_5a78890843e2 | The default database encoding has accordingly been set to "UTF8".
db_1_5a78890843e2 | The default text search configuration will be set to "english".
db_1_5a78890843e2 |