Closed firebird-automations closed 8 years ago
Commented by: @mrotteveel
Can you post all that is necessary to reproduce this. Preferably a simple executable that demonstrates storing the data and retrieval of data and a backup of a database with existing data that exhibits the problem
As I also commented on stackoverflow, this can happen when there is a mismatch between character sets when storing and retrieving data.
What I forgot to mention there is that blobs in Jaybird 2.2 and earlier are even more a special case, as we only use the connection character set to determine conversion, and not the actual character set of the blob. So if you stored data from windows with connection character set NONE (which is the default if nothing is specified!), than chance are high it was stored as Cp1252 (if you are in Western Europe; behavior depending on Windows and Java version), and when you retrieve under Linux with NONE it will usually be retrieved as UTF-8.
Commented by: Benjamin Judas (benjamin.judas)
Simple Test case
Commented by: Benjamin Judas (benjamin.judas)
I added a simple testcase which demonstrates the problem. I cannot offer you the executable (proprietary, non free) for storing the data or a copy of the database (9GiB).
Commented by: Benjamin Judas (benjamin.judas)
DDL-Script for Table creation (Not my business - created by our ERP-provider)
Commented by: Benjamin Judas (benjamin.judas)
Your hints about the encoding of BLOBs and about how data is stored in them lead me to the solution. It's not a driver bug -- Shame on me!
The solution is a simple one-liner (example): String text = new String(rawByteArrayFromBlob[], "CP1252");
This is of course no universal solution, but I will open a ticket for our ERP-Provider to check their user-inputs and probably transcode the input.
Commented by: @mrotteveel
Good to hear you managed to fix this. I will close the ticket.
Submitted by: Benjamin Judas (benjamin.judas)
Attachments: FBTest.java BSAKTX.sql
- I am building a web-application to retrieve data from a Firebird 2.5 database. - The Database is using ISO8859-1. - The data is a Text-BLOB. - The data is accessed via JPA (Eclipselink bundled with Glassfish 4.1.1).
The problem is related to special characters (Trademark-Sign, german Umlauts, ß). I've been trying for two days now but I wasn't able to find a solution. Now I tried to circle the problem and analyze the raw data retrieved from the database.
An Example: Imagine the following text:
EasyLan® Dualboot® Patchkabel RJ45 Kat.5 geschirmt 1:1 grau 2,00m
On a Glassfish running on Windows I get the following bytes:
45 61 73 79 4c 61 6e ffffffae 20 44 75 61 6c 62 6f 6f 74 ffffffae 20 50 61 74 63 68 6b 61 62 65 6c 20 52 4a 34 35 20 4b 61 74 2e 35 20 67 65 73 63 68 69 72 6d 74 20 31 3a 31 20 67 72 61 75 20 32 2c 30 30 6d
On a Glassfish running on Linux I get the following bytes:
45 61 73 79 4c 61 6e ffffffef ffffffbf ffffffbd 20 44 75 61 6c 62 6f 6f 74 ffffffef ffffffbf ffffffbd 20 50 61 74 63 68 6b 61 62 65 6c 20 52 4a 34 35 20 4b 61 74 2e 35 20 67 65 73 63 68 69 72 6d 74 20 31 3a 31 20 67 72 61 75 20 32 2c 30 30 6d
As you can see, the Registered Trademark (®) seems to be messed up.
My connection pools in both servers use the exact same configuration and I added a charset-parameter for both (ISO8859-1). I cannot update to Firebird 3 since the database in question is one of our ERP-System.
I tried the following connection charsets: ISO8859-1, UTF8, Windows1252, NONE
I also did another test and wrote a small program using a "plain" JDBCConnection: I get the same results as with Glassfish + JPA. I think it's a Jaybird issue in relation to the operating system it's being used on.