FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.26k stars 217 forks source link

UNICODE collations does not work with ICU 49 [CORE3946] #4279

Closed firebird-automations closed 12 years ago

firebird-automations commented 12 years ago

Submitted by: @mkubecek

Attachments: collation-gdb.txt

On system with ICU 4.9, command

create collation test_1 for UTF8 from UNICODE;

fails with

Statement failed, SQLSTATE = 42000 unsuccessful metadata update -Invalid collation attributes

With ICU 4.4, the same command succeeds. With ICU 4.9, it fails with UTF8 and any collation but succeeds with ISO8859_1 or ISO8859_2 charset (tested all ISO8859_1 and about half of ISO8859_2 collations).

Commits: FirebirdSQL/firebird@36dcd8e561fb9d6b4e53c6d749d3e6977dc5a512 FirebirdSQL/firebird@8ce4b582f4b3bade63705cb8e55ad5fc0da3cb2e

firebird-automations commented 12 years ago
Modified by: @asfernandes assignee: Adriano dos Santos Fernandes \[ asfernandes \]
firebird-automations commented 12 years ago

Commented by: @asfernandes

Does the builtin UNICODE collation works?

What is the Linux distro?

firebird-automations commented 12 years ago

Commented by: @mkubecek

It doesn't seem to work:

SQL> create database 'localhost:test' default character set UTF8; SQL> create table TBL(S varchar(32) collate UNICODE); Statement failed, SQLSTATE = 22021 unsuccessful metadata update -TBL -COLLATION UNICODE for CHARACTER SET UTF8 is not installed

Distribution is OpenSuSE 12.2. Tested with distribution package (2.5) and 3.0 package from

http://download.opensuse.org/repositories/home:/mkubecek:/firebird30/openSUSE_12.2/

Successful tests were on OpenSuSE 11.1 and 11.4 with 2.5 packages from

http://download.opensuse.org/repositories/home:/mkubecek:/firebird25/

firebird-automations commented 12 years ago

Commented by: @asfernandes

Is there anything in firebird.log?

firebird-automations commented 12 years ago

Commented by: @mkubecek

Nothing at all, neither for "create collation" nor for "create table".

firebird-automations commented 12 years ago

Commented by: @asfernandes

Are you using 32 or 64 bit version?

Please paste the result of: find /usr/ /lib* -name 'libicu*'

firebird-automations commented 12 years ago

Commented by: @mkubecek

It is 64-bit version.

unicorn:\~ #⁠ find /usr/ /lib* -name 'libicu*' /usr/share/susehelp/meta/Development/Libraries/libicu-doc.desktop /usr/share/susehelp/meta/Development/Libraries/libicu17.desktop /usr/share/susehelp/meta/Development/Libraries/libicu-devel.desktop /usr/lib64/libicule.so.49.1 /usr/lib64/libicui18n.so /usr/lib64/libicutu.so.49 /usr/lib64/libiculx.so.49 /usr/lib64/libicudata.so /usr/lib64/libiculx.so /usr/lib64/libicuuc.so.49 /usr/lib64/libicule.so.49 /usr/lib64/libicuuc.so.49.1 /usr/lib64/libicui18n.so.49.1 /usr/lib64/libicuio.so.49 /usr/lib64/libicudata.so.49 /usr/lib64/libicutest.so.49 /usr/lib64/libicudata.so.49.1 /usr/lib64/libicuio.so.49.1 /usr/lib64/libicutu.so.49.1 /usr/lib64/libicutu.so /usr/lib64/libiculx.so.49.1 /usr/lib64/libicule.so /usr/lib64/libicui18n.so.49 /usr/lib64/libicuuc.so /usr/lib64/libicutest.so /usr/lib64/libicutest.so.49.1 /usr/lib64/libicuio.so

firebird-automations commented 12 years ago

Commented by: @asfernandes

What's the result of command below?

objdump -T /usr/lib64/libicuuc.so.49 |grep 'u_init\|u_versionToString\|uloc_countAvailable\|uloc_getAvailable\|uset_close\|uset_getItem\|uset_getItemCount\|uset_open' objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_open\|ucnv_close\|ucnv_fromUChars\|u_tolower\|u_toupper\|u_strCompare\|u_countChar32\|utf8_nextCharSafeBody\|UCNV_FROM_U_CALLBACK_STOP\|UCNV_TO_U_CALLBACK_STOP\|ucnv_fromUnicode' objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_toUnicode\|ucnv_getInvalidChars\|ucnv_getMaxCharSize\|ucnv_getMinCharSize\|ucnv_setFromUCallBack\|ucnv_setToUCallBack' objdump -T /usr/lib64/libicui18n.so.49 |grep 'ucol_close\|ucol_getContractions\|ucol_getSortKey\|ucol_open\|ucol_setAttribute\|ucol_strcoll\|ucol_getVersion\|utrans_open\|utrans_close\|utrans_transUChars'

firebird-automations commented 12 years ago

Commented by: @mkubecek

mike@unicorn:\~> objdump -T /usr/lib64/libicuuc.so.49 |grep 'u_init\|u_versionToString\|uloc_countAvailable\|uloc_getAvailable\|uset_close\|uset_getItem\|uset_getItemCount\|uset_open' 000000000005b3d0 g DF .text 0000000000000155 Base u_versionToString_49 00000000000e22b0 g DF .text 0000000000000012 Base uset_close_49 000000000009aae0 g DF .text 000000000000004f Base uloc_getAvailable_49 00000000000debf0 g DF .text 0000000000000094 Base uset_openPattern_49 00000000000df030 g DF .text 0000000000000005 Base uset_closeOver_49 00000000000e2200 g DF .text 000000000000004d Base uset_openEmpty_49 000000000009ab30 g DF .text 0000000000000034 Base uloc_countAvailable_49 00000000000dec90 g DF .text 00000000000000bc Base uset_openPatternOptions_49 000000000005c840 g DF .text 000000000000006f Base u_init_49 00000000000e2250 g DF .text 0000000000000060 Base uset_open_49 00000000000e26d0 g DF .text 000000000000013a Base uset_getItem_49 00000000000e2690 g DF .text 0000000000000035 Base uset_getItemCount_49

mike@unicorn:\~> objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_open\|ucnv_close\|ucnv_fromUChars\|u_tolower\|u_toupper\|u_strCompare\|u_countChar32\|utf8_nextCharSafeBody\|UCNV_FROM_U_CALLBACK_STOP\|UCNV_TO_U_CALLBACK_STOP\|ucnv_fromUnicode' 00000000000667c0 g DF .text 000000000000014e Base ucnv_close_49 0000000000070dd0 g DF .text 0000000000000284 Base ucnv_fromUnicode_UTF8_49 0000000000066250 g DF .text 00000000000000b1 Base ucnv_openU_49 00000000000ab280 g DF .text 00000000000000d3 Base u_countChar32_49 000000000006d860 g DF .text 0000000000000002 Base UCNV_FROM_U_CALLBACK_STOP_49 0000000000066310 g DF .text 0000000000000084 Base ucnv_openCCSID_49 00000000000aa900 g DF .text 0000000000000029 Base u_strCompare_49 0000000000066240 g DF .text 0000000000000005 Base ucnv_openPackage_49 00000000000a9830 g DF .text 000000000000021a Base utf8_nextCharSafeBody_49 000000000006cc60 g DF .text 00000000000000d2 Base ucnv_openAllNames_49 00000000000caaf0 g DF .text 000000000000000e Base u_toupper_49 000000000006c060 g DF .text 0000000000000122 Base ucnv_openStandardNames_49 00000000000aa2f0 g DF .text 0000000000000158 Base u_strCompareIter_49 000000000006d870 g DF .text 0000000000000002 Base UCNV_TO_U_CALLBACK_STOP_49 0000000000066210 g DF .text 0000000000000023 Base ucnv_open_49 00000000000670e0 g DF .text 000000000000021a Base ucnv_fromUChars_49 00000000000caae0 g DF .text 000000000000000e Base u_tolower_49 0000000000066cc0 g DF .text 00000000000001d3 Base ucnv_fromUnicode_49 0000000000071060 g DF .text 0000000000000367 Base ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_49

mike@unicorn:\~> objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_toUnicode\|ucnv_getInvalidChars\|ucnv_getMaxCharSize\|ucnv_getMinCharSize\|ucnv_setFromUCallBack\|ucnv_setToUCallBack' 0000000000066ea0 g DF .text 0000000000000231 Base ucnv_toUnicode_49 0000000000068cb0 g DF .text 0000000000000057 Base ucnv_getInvalidChars_49 0000000000066aa0 g DF .text 000000000000000d Base ucnv_getMinCharSize_49 0000000000066c90 g DF .text 000000000000002f Base ucnv_setFromUCallBack_49 0000000000066c50 g DF .text 0000000000000031 Base ucnv_setToUCallBack_49 0000000000066a90 g DF .text 0000000000000005 Base ucnv_getMaxCharSize_49

mike@unicorn:\~> objdump -T /usr/lib64/libicui18n.so.49 |grep 'ucol_close\|ucol_getContractions\|ucol_getSortKey\|ucol_open\|ucol_setAttribute\|ucol_strcoll\|ucol_getVersion\|utrans_open\|utrans_close\|utrans_transUChars' 00000000001191a0 g DF .text 0000000000000449 Base ucol_setAttribute_49 000000000011e090 g DF .text 0000000000000022 Base ucol_openRules_49 000000000010a6f0 g DF .text 00000000000000d4 Base ucol_openElements_49 000000000010a7d0 g DF .text 0000000000000072 Base ucol_closeElements_49 000000000011e110 g DF .text 0000000000000648 Base ucol_open_internal_49 0000000000115510 g DF .text 00000000000000dc Base ucol_getSortKey_49 0000000000138e30 g DF .text 0000000000000012 Base utrans_close_49 0000000000138dd0 g DF .text 0000000000000015 Base utrans_openInverse_49 000000000011b1a0 g DF .text 000000000000006d Base ucol_getVersion_49 00000000001227c0 g DF .text 0000000000000019 Base ucol_getContractions_49 0000000000121d40 g DF .text 0000000000000215 Base ucol_openFromShortString_49 0000000000118f60 g DF .text 000000000000000a Base ucol_openBinary_49 000000000011e760 g DF .text 0000000000000049 Base ucol_open_49 000000000011cc90 g DF .text 0000000000000034 Base ucol_openAvailableLocales_49 000000000011db90 g DF .text 00000000000004f9 Base ucol_openRulesForImport_49 000000000011b6e0 g DF .text 00000000000008dd Base ucol_strcoll_49 00000000001390c0 g DF .text 00000000000000a5 Base utrans_openIDs_49 0000000000138d00 g DF .text 00000000000000c1 Base utrans_open_49 000000000011b2e0 g DF .text 00000000000003f7 Base ucol_strcollIter_49 00000000001392d0 g DF .text 000000000000012e Base utrans_transUChars_49 000000000010d630 g DF .text 000000000000015a Base ucol_close_49 0000000000117f70 g DF .text 00000000000000fb Base ucol_getSortKeyWithAllocation_49 0000000000138ba0 g DF .text 0000000000000160 Base utrans_openU_49 0000000000122660 g DF .text 000000000000015c Base ucol_getContractionsAndExpansions_49

firebird-automations commented 12 years ago

Commented by: @asfernandes

I do not see anything problematic.

I need you send backtrace of problem, specifying exact version/buildnum you're using (I do prefer it's done with 3.0).

gdb -args isql -ch utf8 (gdb) run create database 'test.fdb'; <ctrl c> (gdb) catch throw (gdb) cont select 1 from rdb$database where 'a' = 'a' collate unicode; -- gdb must catch an exception (gdb) bt

firebird-automations commented 12 years ago

Commented by: @mkubecek

This output was created with 3.0.0.30084 (svn revision 57178).

The exception is caught in src/jrd/intl.cpp, line 398, CharSetContainer::lookupCollation():

if \(\!lookup\_texttype\(tt, &info\)\)
\{
    delete tt;
    ERR\_post\(Arg::Gds\(isc\_collation\_not\_installed\) << Arg::Str\(info\.collationName\) <<
        Arg::Str\(info\.charsetName\)\);
\}

Value of info can be found in the attachment.

firebird-automations commented 12 years ago
Modified by: @mkubecek Attachment: collation\-gdb\.txt \[ 12238 \]
firebird-automations commented 12 years ago

Commented by: @asfernandes

Please run in ISQL:

show collation unicode;

firebird-automations commented 12 years ago

Commented by: @mkubecek

SQL> show collation unicode; UNICODE, CHARACTER SET UTF8, PAD SPACE, SYSTEM

I also played a bit more with gdb and got to this stack:

⁠0 Jrd::UnicodeUtil::Utf16Collation::loadICU src/common/unicode_util.cpp:1463

⁠1 Jrd::UnicodeUtil::Utf16Collation::create src/common/unicode_util.cpp:1143

⁠2 Firebird::IntlUtil::initUnicodeCollation src/common/IntlUtil.cpp:528

⁠3 ttype_unicode8_init src/jrd/intl_builtin.cpp:1081

⁠4 Jrd::IntlManager::lookupCollation src/jrd/IntlManager.cpp:636

⁠5 lookup_texttype src/jrd/intl.cpp:497

⁠6 CharSetContainer::lookupCollation src/jrd/intl.cpp:394

...

where loadICU(""41.128.4.4", "", ""icu_versions=default") fails

firebird-automations commented 12 years ago

Commented by: @asfernandes

Don't know why, but the collation on your database is initialized incorrectly.

Please locate fbintl.conf and set icu_versions to 4.9: icu_versions 4.9

Then retry. Create a new database, show the collation and test to see what happens.

firebird-automations commented 12 years ago

Commented by: @mkubecek

Now it works (with newly created database) and output of 'show collation' is different:

SQL> create database '/srv/firebird/test3.fdb'; SQL> show collation UNICODE; UNICODE, CHARACTER SET UTF8, PAD SPACE, 'COLL-VERSION=58.0.6.49', SYSTEM

SQL> select 1 from rdb$database where 'a' = 'a' collate unicode;

CONSTANT 

============ 1

firebird-automations commented 12 years ago

Commented by: @asfernandes

Was the test with "icu_versions default" done with a fresh new database too?

firebird-automations commented 12 years ago

Commented by: @mkubecek

Yes (I checked again now to be sure). Could the problem be caused by some part of ICU (or something else) missing during the build?

firebird-automations commented 12 years ago

Commented by: @mkubecek

I did the same test on OpenSuSE 12.1 with ICU 4.6 and 4.8.1 (used both for build and test) and the same version of Firebird. In both cases the collation works even with 'icu_versions = default'. So it looks like some incompatibility introduced between ICU 4.8 and 4.9.

firebird-automations commented 12 years ago

Commented by: @asfernandes

"default" means the version present at build time. Looks like you have an installed dev package without the actual runtime paackage. Or some problem in the build include path.

Locate these lines (here 887) in src/common/unicode_util.cpp:

string version = icuVersion\.isEmpty\(\) ? versions\[0\] : icuVersion;
if \(version == "default"\)
    version\.printf\("%d\.%d", U\_ICU\_VERSION\_MAJOR\_NUM, U\_ICU\_VERSION\_MINOR\_NUM\);

for \(ObjectsArray<string\>::const\_iterator i\(versions\.begin\(\)\); i \!= versions\.end\(\); \+\+i\)

put a breakpoint on the last (for) line in the gdb prompt: (gdb) b unicode_util.cpp:887

Once the breakpoint is reach, print version: (gdb) print version.stringBuffer

Or do play at compile time and check where U_ICU_VERSION_MAJOR_NUM and U_ICU_VERSION_MINOR_NUM is coming from and what's they values.

firebird-automations commented 12 years ago

Commented by: @mkubecek

I get majorVersion = 49, minorVersion = 1, which after

filename.printf(ucTemplate, majorVersion, minorVersion);

gives filename = ""libicuuc.so.491". This fails to load as the name should probably be "libicuuc.so.49". I checked libicu header files and indeed, version 49 (4.9) defines U_ICU_VERSION_MAJOR_NUM=49, U_ICU_VERSION_MINOR_NUM=1 while version 48 (4.8) defined U_ICU_VERSION_MAJOR_NUM=4, U_ICU_VERSION_MINOR_NUM=8.

Looking at the version macros defined by 49 (4.9) and 48 (4.8), it seems U_ICU_VERSION_SHORT might be the right one but I'm not sure it will work correctly with older versions as well. Or maybe we could just distinguish cases U_ICU_VERSION_MAJOR_NUM>4 and U_ICU_VERSION_MAJOR_NUM<=4 (and hope they won't change the scheme again).

firebird-automations commented 12 years ago

Commented by: @asfernandes

Problem is that this is not ICU 4.9, it's ICU 49 really, but they changed how this is encoded in the filename.

Looks like these people has nothing else to do!

firebird-automations commented 12 years ago
Modified by: @asfernandes summary: create collation for UTF8 from UNICODE fails with ICU 4\.9 =\> UNICODE collations does not work with ICU 49
firebird-automations commented 12 years ago

Commented by: @asfernandes

I commited a fix for FB 3.0, without testing with ICU 49. Please test it.

firebird-automations commented 12 years ago

Commented by: @pmakowski

hope you will backport it to 2.5 since Mageia, Fedora and certainly others distribution are using 49 in their next coming release

firebird-automations commented 12 years ago

Commented by: @mkubecek

I confirm that both standard UNICODE collation and custom collation created from it work as expected now. Thank you.

firebird-automations commented 12 years ago

Commented by: @asfernandes

Committed to 2.5 branch. Please test it.

firebird-automations commented 12 years ago
Modified by: @asfernandes status: Open \[ 1 \] =\> Resolved \[ 5 \] resolution: Fixed \[ 1 \] Fix Version: 2\.5\.2 \[ 10450 \] Fix Version: 3\.0 Alpha 1 \[ 10331 \]
firebird-automations commented 12 years ago

Commented by: @mkubecek

Current 2.5 from subversion works for me. Thank you.

firebird-automations commented 12 years ago

Commented by: @pmakowski

this is marked as fixed in 2.5.2, but it is not the case

firebird-automations commented 12 years ago
Modified by: @pmakowski Fix Version: 2\.5\.3 \[ 10461 \] Fix Version: 2\.5\.2 \[ 10450 \] =\>
firebird-automations commented 11 years ago
Modified by: @pcisar status: Resolved \[ 5 \] =\> Closed \[ 6 \]
firebird-automations commented 8 years ago
Modified by: @pavel-zotov status: Closed \[ 6 \] =\> Closed \[ 6 \] QA Status: No test