FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.23k stars 212 forks source link

Wrong conversion of mutation charcters/umlauts & GBAK (FIX_FSS) failed on multilingual databases [CORE2909] #3293

Closed firebird-automations closed 14 years ago

firebird-automations commented 14 years ago

Submitted by: Alexander Pierer (pierer)

Hello,

refering to CORE2885, I still have some problems by restoring a Firebird-1.5-backupfile under Firebird 2.5 RC 2. As you recommended, I used the FIX_FSS_DATA- and FIX_FSS_METADATA-switches with the charactersets ISO8859_1 (German) or WIN1252 (Western Europe). With these settings the restore process completes without any errors or warnings.

But when reading strings from the restored database, I was wondering about wrong converted german umlauts/mutation characters. For example with FIX_FSS_M/D ISO8859_1 the german "ü" (unicode = 0x00FC, UTF8 = 0xC3BC) is converted to "ü" (unicode = 0x00C3 + 0x00BC). You can have a look on this problem in the database I attached to CORE2885 (table: FORMULARLABELS, field: NAME_49). For solving this problem I wrote an UDF that reconverts byte-by-byte the wrong transliterations to UTF8.

But now I have a great problem with an other multilingual database that contains english, german and russian strings. Wihout FIX_FSS-switches GBAK returns the well known error message: "gbak: ERROR:Malformed string gbak:Invalid metadata detected. Use -FIX_FSS_METADATA option". After that I attempt to run the GBAK-restore with several charsets for the FIX_FSS_DATA- and FIX_FSS_METADATA-switches. But I always received the error message:

"gbak:restoring data for table PARAMETERPOOL gbak: ERROR:arithmetic exception, numeric overflow, or string truncation gbak: ERROR: string right truncation gbak: ERROR:gds_$send failed"

I tried all WIN125x- and ISO8859-x character sets without any success.

Is there any switch option that let GBAK ignore malformed strings or continues the restore process after such an error occured? (In this case I could correct the malformed strings with an UDF after restoring on my own.)

Sorry for may long report and thank you very much!

Best regards Alexander

P.S. Could you please remove the ZIP-file I attached to CORE2885 from your server?

firebird-automations commented 14 years ago

Commented by: Alexander Pierer (pierer)

I attach the affected Firebird-1.5-backupfile. Could you please remove this file from the server when you are closing the thread?

firebird-automations commented 14 years ago
Modified by: Alexander Pierer (pierer) Attachment: EnglishRussianGerman\.gbk \[ 11590 \]
firebird-automations commented 14 years ago

Commented by: @mrotteveel

Have you tried specifying UTF8 as the character set? I haven't looked at your database, but the actual problem looks like UTF8 data was read as if it was WIN1252/ISO8859-1 and then stored as UTF8.

firebird-automations commented 14 years ago

Commented by: Alexander Pierer (pierer)

I also tried it with UTF8 and UNICODE_FSS as FIX_FSS character sets without success.

firebird-automations commented 14 years ago

Commented by: @asfernandes

No, there is no option to let you restore the "broken" data. And FIX_FSS_DATA will not do a magic and discover the encoding you use to store each record/field.

You could restore it in 2.1, fix the data to match the used (UNICODE_FSS) encoding, backup again and restore.

firebird-automations commented 14 years ago

Commented by: Alexander Pierer (pierer)

I guess I found a solution for my dedicated case. I will describe my solution in the hope that this may help other users to migrate from Firebird 1.5 to Firebird 2.5. Maybe it is not the efficientes way and some of my step can be skipped or reordered.

1. I install firebird 1.5 and restore the backupfile I orginally created under Firebird 1.5 into a new database

2. I correct the database's metadata as described in .\Firebird\Firebird_2_1\misc\upgrade\metadata\metadata_charset.txt (using "select * from rdb$fix_metadata('ISO8859_1')" for correcting the "german" metadata.

3. I create a new backup of this database

4. I install Firebird 2.1 and restore the backup

5. I alter the domains of all affected "VARCHAR UNICODE_FSS"-fields to character set & collate UTF8 in the system table RDB$FIELDS, reminding that for UTF8 RDB$FIELD_LENGTH = RDB$CHARACTER_LENGTH*3.

6. I create a new backup of this database

7. I install Firebird 2.5.

8. I restore the backup under Firebird 2.5 and all foreign characters are shown probably

firebird-automations commented 14 years ago
Modified by: @asfernandes status: Open \[ 1 \] =\> Resolved \[ 5 \] resolution: Won't Fix \[ 2 \]
firebird-automations commented 14 years ago

Commented by: Alexander Pierer (pierer)

P.S. Could you please remove the files I attached to this thread (CORE 2909) and to the thread CORE2885 from your server.

Thank you very much.

firebird-automations commented 14 years ago
Modified by: @pcisar Attachment: EnglishRussianGerman\.gbk \[ 11590 \] =\>
firebird-automations commented 13 years ago
Modified by: @pcisar status: Resolved \[ 5 \] =\> Closed \[ 6 \]
firebird-automations commented 11 years ago

Commented by: Marc Fox (mastersoda)

I found two migration tools (Firebird 1.5 to 2.5):

->http://gsbelarus.com/gs/fdbconvert/

->https://docs.google.com/file/d/0BwW3Ccp8dA43Y2dLQzZOZHpqeWM/edit?usp=sharing