Charset UTF8 with collate UNICODE: ORDER BY works wrong for polish characters [CORE3738]

firebird-automations commented 12 years ago

Submitted by: Mariusz Nogala (mnogala)

I have a varchar column defined as charset UTF8 with collate UNICODE. When I use ORDER BY the result set is:

a, ą, ąb, ac

and should be:

a, ac, ą, ąb

The problem is with the polish 'ą' character. It should just follow the ascii 'a' character. So why the 'ac' text is older than 'ą' or 'ąb' in the ORDER BY ??? If you sort single character texts there is no problem (as seen in the above example: 'ą' follows 'a').

The same problem is with other polish characters: ć, ś, ę, ń, ó, ż, ź. They should just follow: c, s, e, n, o, z respectively. For example: ORDER BY gives: s, ś, śb, sc

and should be: s, sc, ś, śb

What is also strange: there is no problem with the polish characters: 'Ł' and 'ł'. They correctly always follow: 'L' and 'l' respectively.

Summarizing: I think that charset UTF8 with collate UNICODE should work with polish characters in the same way as charset WIN1250 with collate pxw_plk. System Windows also works in this way.

firebird-automations commented 12 years ago

Commented by: @mrotteveel

The UNICODE collation is a very basic collation, the Polish sort order is a very specific sort order (almost every country or language has its own sort order for character. You will need to create your own collation using CREATE COLLATION http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd25-ddl-collation.html#langrefupd25-ddl-collation-create

This would be something like the following (NOTE: I am not 100% sure this will achieve the effect you desire). CREATE COLLATION UNICODE_PL FOR UTF8 FROM UNICODE NO PAD -- not sure about this one CASE SENSITIVE ACCENT SENSITIVE 'LOCALE=pl_PL'

firebird-automations commented 12 years ago

Commented by: Sean Leyne (seanleyne)

The problem is in your choice of the UNICODE collation, not in the collation itself or a specific engine deficiency. The collation was not designed/intended for the purpose you are using it for.

As Mark has replied, by creating and using an appropriate collation, you will see/get the your expected result.

firebird-automations commented 12 years ago

Modified by: Sean Leyne (seanleyne)

status: Open \[ 1 \] =\> Resolved \[ 5 \] resolution: Won't Fix \[ 2 \]

firebird-automations commented 12 years ago

Modified by: @pcisar

status: Resolved \[ 5 \] =\> Closed \[ 6 \]

FirebirdSQL / firebird

Charset UTF8 with collate UNICODE: ORDER BY works wrong for polish characters [CORE3738] #1402