fernandobatels / rsfbclient

Rust Firebird Client
MIT License
76 stars 11 forks source link

Encoding #45

Closed AndreiTS closed 4 years ago

AndreiTS commented 4 years ago

How can I specify a encoding to connect to a database?

Example in python:

conn  = fdb.connect(
    user='SYSDBA',
    password='masterkey',
    database='/firebird/data/BASE.FDB',
    charset='win1252'
)
jairinhohw commented 4 years ago

Rust String type is UTF-8 encoded, so the connection needs to be UTF-8 encoded to allow to store the data from the database correctly in rust String. As such, this library only connects using UTF-8 charset (It can connect to databases with data stored in other charsets, but they are converted to be sent / received in UTF-8)

AndreiTS commented 4 years ago

I don't think it's working:

Table: image

Output: image

But, if I change "PAO" to "PÃO" it stops when it reaches this record:

image

Output: image

Code: https://gist.github.com/AndreiTS/cadb9475a8ac2065ca31def23d7fde05

jairinhohw commented 4 years ago

Can you try to print the error message? I don't see any in the output.

Edit: Or provide the database file for me to test

AndreiTS commented 4 years ago

image

jairinhohw commented 4 years ago

What firebird version are you using? Didi you set the encoding in the DBeaver configuration? image

I've tried writing Pão teste PÃO áã ç using dbeaver with win1252 encoding and was able to read it in rust.

AndreiTS commented 4 years ago

Firebird 2.5.9 image

AndreiTS commented 4 years ago

I don't know maybe is something wrong with the database, I will try with a new database tomorrow

AndreiTS commented 4 years ago

win1252 is different from WIN1252?

jairinhohw commented 4 years ago

Should be the same, not really sure what is wrong

AndreiTS commented 4 years ago

I tested with another database (driver: win1252) and the result is the same:

Table: image image

It only works if a set the driver to UTF8 and change the text again:

Driver set to UTF8: image Fixed text: image image

AndreiTS commented 4 years ago

The charset of "EMPCADASTRO" is None. If the charset is set to win1252, like this: image It woks. (encoding is specified to win1252)

I tried to change the charset of other columns but it's not working:

I tried this: https://firebirdsql.org/refdocs/langrefupd25-ddl-charset.html and this command:

ALTER TABLE TB_CLIENTE ALTER COLUMN EMPCADASTRO TYPE VARCHAR(20) CHARACTER SET WIN_1252;

But the charset still is None.

Do you know if there is another way to get this working?

AndreiTS commented 4 years ago

I made some changes to the crate rsfbclient-native (rows.rs) in order to work with columns that have win1252 text but are set to the charset none:

image

image

I learned rust last week and this is a "gambiarra", maybe you can do something better with it

AndreiTS commented 4 years ago

Result:

rsfbclient 0.9.0: image

rsfbclient 0.8.0 with changes above: image

fernandobatels commented 4 years ago

The charset of "EMPCADASTRO" is None. If the charset is set to win1252, like this: image It woks. (encoding is specified to win1252)

I tried to change the charset of other columns but it's not working:

I tried this: https://firebirdsql.org/refdocs/langrefupd25-ddl-charset.html and this command:

ALTER TABLE TB_CLIENTE ALTER COLUMN EMPCADASTRO TYPE VARCHAR(20) CHARACTER SET WIN_1252;

But the charset still is None.

Do you know if there is another way to get this working?

You cannot change a the charset of numeric columns, because they don't have it. So the charset always will be 'None'

AndreiTS commented 4 years ago

I was trying to change the charset of a varchar column: image

fernandobatels commented 4 years ago

@AndreiTS , maybe the problem is your terminal encoding. You can make a test writing this data directly to a text file?

AndreiTS commented 4 years ago

rsfbclient 0.8.0 with my changes: https://gist.github.com/AndreiTS/afc2938868a4e657da2cc92f1c450605 rsfbclient 0.9.0: image

The code is the same, I just changes the dependencies

fernandobatels commented 4 years ago

@AndreiTS , I make this test for check some charsets. You can run in your environment?

fernandobatels commented 4 years ago

rsfbclient 0.8.0 with my changes: https://gist.github.com/AndreiTS/afc2938868a4e657da2cc92f1c450605 rsfbclient 0.9.0: image

The code is the same, I just changes the dependencies

Please, post here the backtrace(RUST_BACKTRACE=1 cargo run). I think the problem is in the column name and not in the content.

AndreiTS commented 4 years ago

I modified the test to connect to the same database: image

Test with rsfbclient 0.9.0 (no changes): https://gist.github.com/AndreiTS/5bc866b90d9bdff178ff33ce08fe7f7e Test with rsfbclient 0.9.0 (with the changes that I did, they break utf8 text :D): https://gist.github.com/AndreiTS/f9628c0f50092f555825b6e270647ae6

AndreiTS commented 4 years ago

rsfbclient 0.8.0 with my changes: https://gist.github.com/AndreiTS/afc2938868a4e657da2cc92f1c450605 rsfbclient 0.9.0: image The code is the same, I just changes the dependencies

Please, post here the backtrace(RUST_BACKTRACE=1 cargo run). I think the problem is in the column name and not in the content.

https://gist.github.com/AndreiTS/6f2b9aadf1824b212021798cdf19aa31

fernandobatels commented 4 years ago

I think the problem is in the column name and not in the content.

Forget. The problem is even with the content.

fernandobatels commented 4 years ago

Test with rsfbclient 0.9.0 (no changes): https://gist.github.com/AndreiTS/5bc866b90d9bdff178ff33ce08fe7f7e

So, with cast works fine.

You can share your database(not all tables) + table schema/DDL?

AndreiTS commented 4 years ago

Test with rsfbclient 0.9.0 (no changes): https://gist.github.com/AndreiTS/5bc866b90d9bdff178ff33ce08fe7f7e

So, with cast works fine.

You can share your database(not all tables) + table schema/DDL?

https://gist.github.com/AndreiTS/51cc172e5b4484b3e7886829b057b271

fernandobatels commented 4 years ago

Test with rsfbclient 0.9.0 (no changes): https://gist.github.com/AndreiTS/5bc866b90d9bdff178ff33ce08fe7f7e

So, with cast works fine. You can share your database(not all tables) + table schema/DDL?

https://gist.github.com/AndreiTS/51cc172e5b4484b3e7886829b057b271

I runned this tests and works fine. You have access to a firebird 3 server?

AndreiTS commented 4 years ago

Did you run the test or the code in the gist? Yes, I have access to a fb3 server

fernandobatels commented 4 years ago

Did you run the test or the code in the gist? Yes, I have access to a fb3 server

Yes. I'm test your gist, but I'm running into a linux environment

AndreiTS commented 4 years ago

I'm using linux too. Distro: Pop OS 20.04

fernandobatels commented 4 years ago

I'm too, Pop OS 20.04

I thought it was a windows environment, because of "CREATE DATABASE 'localhost:D:\usuarios\Administrador5\Desktop\BASE.FDB''.

You can test your gist into a recreated(whit your sql script) database?

AndreiTS commented 4 years ago

I copied the database to a Windows Server to export it using IBExpert, but all the code I ran in linux

fernandobatels commented 4 years ago

If you are running the firebird into a windows server and you client into a linux environment, maybe they have some of conflicts. Some people are recommending to use the ISO88591 instead: https://www.projetoacbr.com.br/forum/topic/29433-lazarus-firebird-caracteres-estranhos/

You can make a test with a full linux environment ?

AndreiTS commented 4 years ago

I'm running firebird inside a docker container:

image

I uploaded the database and the code to google drive to make things easier: https://drive.google.com/drive/folders/1rt1OQcg28cYY6tr5VDtybqQm4eYP6b4H

fernandobatels commented 4 years ago

I'm running firebird inside a docker container:

image

I uploaded the database and the code to google drive to make things easier: https://drive.google.com/drive/folders/1rt1OQcg28cYY6tr5VDtybqQm4eYP6b4H

Now i got it, hehe

fernandobatels commented 4 years ago

I'm running firebird inside a docker container:

image

I uploaded the database and the code to google drive to make things easier: https://drive.google.com/drive/folders/1rt1OQcg28cYY6tr5VDtybqQm4eYP6b4H

For now, you can cast the columns to a UTF8 and this will work. I will investigate what we can do

fernandobatels commented 4 years ago

I think this is a charset/collation problem of database. I made a new column with UTF8 and when I updated the firebird return an error:

ALTER TABLE C000017 ADD GRUPOALT2 Varchar(30) CHARACTER SET UTF8;
commit;
UPDATE C000017 a SET a.GRUPOALT2 = CAST(a.GRUPO as varchar(30) CHARACTER SET utf8);

The error:

SQL Message : -104
Invalid token

Engine Code    : 335544849
Engine Message :
Malformed string

Has anyone ever experienced this? @jairinhohw @juarezr

jairinhohw commented 4 years ago

So, i think i understood the problem. The problem is that the column you are trying to access has a charset set no 'NONE', so to the firebird database, the column is basically the same as a binary column, it does not know that the column is win1252 encoded, so it cannot convert to utf8 to send correctly to the client. This can be seen when you set the utf8 encoding in the dbeaver and got nonsensical data in the print, this would not happend if the firebird knew the charset of the column, as it would convert it to utf8and would display fine in dbeaver and rust.

AndreiTS commented 4 years ago

So, i think i understood the problem. The problem is that the column you are trying to access has a charset set no 'NONE', so to the firebird database, the column is basically the same as a binary column, it does not know that the column is win1252 encoded, so it cannot convert to utf8 to send correctly to the client. This can be seen when you set the utf8 encoding in the dbeaver and got nonsensical data in the print, this would not happend if the firebird knew the charset of the column, as it would convert it to utf8and would display fine in dbeaver and rust.

Yes exactly. In python with the "fdb" package you can force a charset by passing .charset() in the connection property. I think this crate could have something similar

AndreiTS commented 4 years ago

https://fdb.readthedocs.io/en/v2.0/reference.html#fdb.connect

source: https://github.com/FirebirdSQL/fdb

jairinhohw commented 4 years ago

I think this is a charset/collation problem of database. I made a new column with UTF8 and when I updated the firebird return an error:

ALTER TABLE C000017 ADD GRUPOALT2 Varchar(30) CHARACTER SET UTF8;
commit;
UPDATE C000017 a SET a.GRUPOALT2 = CAST(a.GRUPO as varchar(30) CHARACTER SET utf8);

The error:

SQL Message : -104
Invalid token

Engine Code    : 335544849
Engine Message :
Malformed string

Has anyone ever experienced this? @jairinhohw @juarezr

If the GRUPO column was not UTF8 You need to convert to the original charset (probably win1252 in this case) in the cast, not to UTF8.

jairinhohw commented 4 years ago

~But the easier solution would be to use dbeaver to set the column charset to win1252 for all text columns that need it.~ Edit: It seems you cannot change a column with 'NONE' charset, as the dababase does not know how to convert to the new charset.

jairinhohw commented 4 years ago

You select like this too: SELECT cast(grupo AS varchar(30) CHARACTER SET WIN_1252) FROM C000017;

jairinhohw commented 4 years ago

If you want to convert the column to a new format:

ALTER TABLE C000017 ADD GRUPOALT2 Varchar(30) CHARACTER SET UTF8;
UPDATE C000017 a SET a.GRUPOALT2 = CAST(a.GRUPO as varchar(30) CHARACTER SET WIN_1252);

After this, you can delete the old column and rename the new one with the old name

Edit: The utf8 in the ALTER TABLE can be set to WIN_1252 to not mess up possible other clients that expect it to be win1252 encoded and does not set the connection encoding.

AndreiTS commented 4 years ago

The problem is that the database is very big, and the program that is going to use it doesn't work with UTF8 (it's an delphi 7 application). I will try your suggestions

jairinhohw commented 4 years ago

The problem is that the database is very big, and the program that is going to use it doesn't work with UTF8 (it's an delphi 7 application). I will try your suggestions

You can cast it on select only for the rust client, so nothing will be changed. SELECT cast(grupo AS varchar(30) CHARACTER SET WIN_1252) FROM C000017;

jairinhohw commented 4 years ago

The conversion will work as long as you create a new column with the WIN_1252 charset, which will not change anything in practice, except that the firebird now knows the column is WIN_1252 encoded.

fernandobatels commented 4 years ago

So, i think i understood the problem. The problem is that the column you are trying to access has a charset set no 'NONE', so to the firebird database, the column is basically the same as a binary column, it does not know that the column is win1252 encoded, so it cannot convert to utf8 to send correctly to the client. This can be seen when you set the utf8 encoding in the dbeaver and got nonsensical data in the print, this would not happend if the firebird knew the charset of the column, as it would convert it to utf8and would display fine in dbeaver and rust.

Yes exactly. In python with the "fdb" package you can force a charset by passing .charset() in the connection property. I think this crate could have something similar

Yesterday I made some tests changing the charset used, but it also didn't work

fernandobatels commented 4 years ago

The conversion will work as long as you create a new column with the WIN_1252 charset, which will not change anything in practice, except that the firebird now knows the column is WIN_1252 encoded.

In fact, we always must use a default charset instead of none. But, I think that is a common problem(database with 'none' charset).

fernandobatels commented 4 years ago

Using the pure_rust feature didn't work either

fernandobatels commented 4 years ago

@AndreiTS, please. Test this commit

fernandobatels commented 4 years ago

I'm working in the charset support, but your case(the C000017 table) are working now