cap-js / cds-dbs

Monorepo for SQL Database Services for CAP
https://cap.cloud.sap/docs/
Apache License 2.0
34 stars 10 forks source link

postgres - simplified chinese collation on BTP Hyperscaler Option #747

Open martin-kl opened 2 months ago

martin-kl commented 2 months ago

Description of erroneous behaviour

Hello, when using "Simplified Chinese" in SAP Build Work Zone, requests use the "zh_CN" locale. On queries with orderBy, the method orderByICU in PostgresService.js translates "zh_CN" to "zh-CN-x-icu". Those queries fail on the PostgreSQL Hyperscaler Option (AWS, PostgreSQL version 14.9) with:

collation "zh-CN-x-icu" for encoding "UTF8" does not exist
Turns out that the following locales beginning with "zh" are available: collname collcollate collctype collversion
zh-Hans-CN-x-icu zh-Hans-CN zh-Hans-CN 153.80.32.1
zh-Hans-HK-x-icu zh-Hans-HK zh-Hans-HK 153.80.32.1
zh-Hans-MO-x-icu zh-Hans-MO zh-Hans-MO 153.80.32.1
zh-Hans-SG-x-icu zh-Hans-SG zh-Hans-SG 153.80.32.1
zh-Hans-x-icu zh-Hans zh-Hans 153.80.32.1
zh-Hant-HK-x-icu zh-Hant-HK zh-Hant-HK 153.80.32.1
zh-Hant-MO-x-icu zh-Hant-MO zh-Hant-MO 153.80.32.1
zh-Hant-TW-x-icu zh-Hant-TW zh-Hant-TW 153.80.32.1
zh-Hant-x-icu zh-Hant zh-Hant 153.80.32.1
zh-x-icu zh zh 153.80.32.1
zh_CN zh_CN zh_CN 2.26-59.amzn2
zh_CN zh_CN.utf8 zh_CN.utf8 2.26-59.amzn2
zh_CN.gb2312 zh_CN.gb2312 zh_CN.gb2312 2.26-59.amzn2
zh_CN.utf8 zh_CN.utf8 zh_CN.utf8 2.26-59.amzn2
zh_HK zh_HK.utf8 zh_HK.utf8 2.26-59.amzn2
zh_HK.utf8 zh_HK.utf8 zh_HK.utf8 2.26-59.amzn2
zh_SG zh_SG.utf8 zh_SG.utf8 2.26-59.amzn2
zh_SG zh_SG zh_SG 2.26-59.amzn2
zh_SG.gb2312 zh_SG.gb2312 zh_SG.gb2312 2.26-59.amzn2
zh_SG.utf8 zh_SG.utf8 zh_SG.utf8 2.26-59.amzn2
zh_TW zh_TW.utf8 zh_TW.utf8 2.26-59.amzn2
zh_TW zh_TW.euctw zh_TW.euctw 2.26-59.amzn2
zh_TW.euctw zh_TW.euctw zh_TW.euctw 2.26-59.amzn2
zh_TW.utf8 zh_TW.utf8 zh_TW.utf8 2.26-59.amzn2

where "zh-CN-x-icu" is missing but "zh-Hans-CN-x-icu" would be available - "Hans" for "Han (simplified variant)" Source

We managed to get it to work in our project by creating the missing collation via

CREATE COLLATION "zh-CN-x-icu" from "zh-Hans-CN-x-icu";

but perhaps this could be integrated natively into @cap-js/postgres?

Detailed steps to reproduce

execute query containing "orderBy" on locale "zh_CN" on PostgreSQL Hyperscaler Option

Details about your project

customer-project
OData version v4
Node.js version v18.19.1
@sap/cds 7.9.3
@sap/cds-compiler 4.9.2
@sap/cds-dk 7.9.4
@cap-js/postgres 1.8.0
@cap-js/sqlite 1.7.0
BobdenOs commented 2 months ago

@martin-kl Thank you for reporting this productive behavior.

As we don't have access to all deployments of Postgres out there. We had to follow and trust the documentation of Postgres here. Which states that the icu localization auto generates C equivalent collations with the suffix -x-icu. It seems that zh_CN is an exception as the icu definition has colliding definitions with the traditional and simplified versions being of equal weight.

Currently @cap-js/postgres does an initial check whether the database instance has icu collations defined (code). If that is the case it will use icu as collation, but when the collation en-x-icu does not exist it will use the C collations. It will also check that when a locale comes in that the database knows the C collation. If not it fallbacks to using no collation in queries in that unknown locale.

@martin-kl would it have an impact whether zh_CN or zh-Hans-CN-x-icu is used ? Or would it be sufficient if we provide an option to pick the C collation over the icu collations ?

martin-kl commented 2 months ago

@BobdenOs thanks for the quick reply. As far as I can see it doesn't make a difference between zh_CN and zh-Hans-CN-x-icu in our application - so both would be fine for us.

Generally speaking as CAP developer I'd like it to work automatically, meaning that I don't need an option if it works with either the C or the icu collation out-of-the-box.