FirebirdSQL / NETProvider

Firebird ADO.NET Data Provider
https://www.firebirdsql.org/en/net-provider/
Other
152 stars 62 forks source link

Use the correct charset when writing values to the database #1138

Closed Jsil8 closed 5 months ago

Jsil8 commented 9 months ago

The FbParameter class provides a 'Charset' property to specify which charset should be used when encoding/decoding a value. This property is not used anywhere. So when a parameter is set to use the 'ISO8859_1' charset, a value set by FbParameter will be written to the database using the 'NONE' (default) encoding.

When the charset is specified in the connectionstring, this is also ignored when writing a value to the database.

This PR uses the following rule to optionally apply a custom charset:

  1. The FbParameter if it is not Default or None.
  2. The FbConnection/Database charset if not None.

The 'Charset' class is extended with an 'FbCharset' enum to make a match betwheen the FbParameter's Charset property and the 'Charset' class.

cincuranet commented 9 months ago

Can you show code, where the current approach does not work?

Jsil8 commented 9 months ago

I have tested with the following code:

string connStr = "data source=localhost;initial catalog=db.fdb;user id=user;password=pwd;Charset=ISO8859_1;";
FbConnection fbConn = new FbConnection(connStr);
await fbConn.OpenAsync(cancellationToken);

FbCommand fbCmd = new FbCommand("UPDATE MyTable SET MyColumn = @Value", fbConn);
FbParameter fbParam = fbCmd.CreateParameter();
fbParam.ParameterName = "@Value";
fbParam.FbDbType = FbDbType.VarChar;
fbParam.Charset = FbCharset.Iso8859_1;
fbParam.Value = "Knäckebröd";

fbCmd.Parameters.Add(fbParam);

await fbCmd.ExecuteNonQueryAsync(cancellationToken);

If I look at the database with IBExpert, I see the following in the database in the situation before: image

And when I get the value from the database with the FirebirdClient, I also get the value Knäckebröd This is also visible when debugging and see the bytes written and receiving to the database.

Here the differences in hex and bytes between UTF-8 AND ISO-8859-1 for Knäckebröd: UTF8 HEX: 4b 6e [c3 a4 ] 63 6b 65 62 72 [c3 b6 ] 64 DEC: 75 110 [195 164] 99 107 101 98 114 [195 182] 100

ISO-8859-1 HEX: 4b 6e e4 63 6b 65 62 72 f6 64 DEC: 75 110 228 99 107 101 98 114 246 100

When not applying the fix. The GdsStatement uses the wrong charset when writing to the database in the method WriteRawParameter: image This results in an array of 12 bytes (which would be correct when using UTF-8)

After applying the fix: image image

cincuranet commented 9 months ago

What is the charset of MyColumn and/or database?

Jsil8 commented 9 months ago

IBExpert shows 'NONE' at the column charset and database registration info

cincuranet commented 8 months ago

The NONE charset is always problem waiting to happen. That said, I'm not saying there isn't room for improvement.

AFAIR at the moment the FbParameter.Charset is completely unused in lower layers.

Your work is a good start, but needs some polishing and careful thinking regarding backwards compatibility. I'll do first pass as review. It's fine if we abandon this PR eventually and start from scratch. In fact I think this should be handled in multiple isolated steps (see my review comments).

Also at least few tests should be included.