FirebirdSQL / jaybird

JDBC driver for Firebird
https://www.firebirdsql.org/en/jdbc-driver/
GNU Lesser General Public License v2.1
91 stars 23 forks source link

Encoding differs when reading TEXT-Blobs in Windows/Linux [JDBC454] #493

Closed firebird-automations closed 8 years ago

firebird-automations commented 8 years ago

Submitted by: Benjamin Judas (benjamin.judas)

Attachments: FBTest.java BSAKTX.sql

- I am building a web-application to retrieve data from a Firebird 2.5 database. - The Database is using ISO8859-1. - The data is a Text-BLOB. - The data is accessed via JPA (Eclipselink bundled with Glassfish 4.1.1).

The problem is related to special characters (Trademark-Sign, german Umlauts, ß). I've been trying for two days now but I wasn't able to find a solution. Now I tried to circle the problem and analyze the raw data retrieved from the database.

An Example: Imagine the following text:

EasyLan® Dualboot® Patchkabel RJ45 Kat.5 geschirmt 1:1 grau 2,00m

On a Glassfish running on Windows I get the following bytes:

45 61 73 79 4c 61 6e ffffffae 20 44 75 61 6c 62 6f 6f 74 ffffffae 20 50 61 74 63 68 6b 61 62 65 6c 20 52 4a 34 35 20 4b 61 74 2e 35 20 67 65 73 63 68 69 72 6d 74 20 31 3a 31 20 67 72 61 75 20 32 2c 30 30 6d

On a Glassfish running on Linux I get the following bytes:

45 61 73 79 4c 61 6e ffffffef ffffffbf ffffffbd 20 44 75 61 6c 62 6f 6f 74 ffffffef ffffffbf ffffffbd 20 50 61 74 63 68 6b 61 62 65 6c 20 52 4a 34 35 20 4b 61 74 2e 35 20 67 65 73 63 68 69 72 6d 74 20 31 3a 31 20 67 72 61 75 20 32 2c 30 30 6d

As you can see, the Registered Trademark (®) seems to be messed up.

My connection pools in both servers use the exact same configuration and I added a charset-parameter for both (ISO8859-1). I cannot update to Firebird 3 since the database in question is one of our ERP-System.

I tried the following connection charsets: ISO8859-1, UTF8, Windows1252, NONE

I also did another test and wrote a small program using a "plain" JDBCConnection: I get the same results as with Glassfish + JPA. I think it's a Jaybird issue in relation to the operating system it's being used on.

firebird-automations commented 8 years ago

Commented by: @mrotteveel

Can you post all that is necessary to reproduce this. Preferably a simple executable that demonstrates storing the data and retrieval of data and a backup of a database with existing data that exhibits the problem

As I also commented on stackoverflow, this can happen when there is a mismatch between character sets when storing and retrieving data.

What I forgot to mention there is that blobs in Jaybird 2.2 and earlier are even more a special case, as we only use the connection character set to determine conversion, and not the actual character set of the blob. So if you stored data from windows with connection character set NONE (which is the default if nothing is specified!), than chance are high it was stored as Cp1252 (if you are in Western Europe; behavior depending on Windows and Java version), and when you retrieve under Linux with NONE it will usually be retrieved as UTF-8.

firebird-automations commented 8 years ago

Commented by: Benjamin Judas (benjamin.judas)

Simple Test case

firebird-automations commented 8 years ago
Modified by: Benjamin Judas (benjamin.judas) Attachment: FBTest\.java \[ 13013 \]
firebird-automations commented 8 years ago

Commented by: Benjamin Judas (benjamin.judas)

I added a simple testcase which demonstrates the problem. I cannot offer you the executable (proprietary, non free) for storing the data or a copy of the database (9GiB).

firebird-automations commented 8 years ago

Commented by: Benjamin Judas (benjamin.judas)

DDL-Script for Table creation (Not my business - created by our ERP-provider)

firebird-automations commented 8 years ago
Modified by: Benjamin Judas (benjamin.judas) Attachment: BSAKTX\.sql \[ 13014 \]
firebird-automations commented 8 years ago

Commented by: Benjamin Judas (benjamin.judas)

Your hints about the encoding of BLOBs and about how data is stored in them lead me to the solution. It's not a driver bug -- Shame on me!

The solution is a simple one-liner (example): String text = new String(rawByteArrayFromBlob[], "CP1252");

This is of course no universal solution, but I will open a ticket for our ERP-Provider to check their user-inputs and probably transcode the input.

firebird-automations commented 8 years ago

Commented by: @mrotteveel

Good to hear you managed to fix this. I will close the ticket.

firebird-automations commented 8 years ago
Modified by: @mrotteveel status: Open \[ 1 \] =\> Closed \[ 6 \] resolution: Cannot Reproduce \[ 5 \]