FirebirdSQL / fdb

Firebird Driver for Python
https://www.firebirdsql.org/en/devel-python-driver/
Other
60 stars 26 forks source link

fdb decodes message with a system encoding while it`s encoded using server encoding #105

Open nikto-b opened 2 years ago

nikto-b commented 2 years ago

How to reproduce:

  1. Run Firebird 2.0 under Windows 10 (default charset is CP1251)
  2. Run python3 under Linux (default charset is UTF-8) with fdb==2.0.2
  3. Run procedure that returns as an exception some cyrillic symbols
  4. See 'utf-8' codec can't decode byte 0xf2 in position 0: invalid continuation byte error

Stacktrace points into a fbcore.py:607
Probably, the solution can be to use a charset option from the connect method here but have no idea how to do this

kmateusz186 commented 2 years ago

I ran into the same problem, having WIN1250 charset when connecting to the database. I solved it by creating a global variable and overwriting it in the connect method. Having a global variable I used it in the exception_from_status method. def exception_from_status(error, status, preamble=None): ....... if PYTHON_MAJOR_VER == 3: msglist.append('- ' + (msg.value).decode(GLOBAL_VAR_NAME)) I don't know if this is the best solution, but it works.

iwkse commented 1 year ago

I went though a similar bug, I've solved it adding the "replace" option to the decode function

PracticallyNothing commented 6 days ago

Apologies for bumping an old issue.

We've also had to deal with this problem in 2024.

We have both a Python backend and a Firebird 4 DB running on Linux. The database is encoded using cp1251/WIN1251 for legacy reasons, while the backend speaks UTF-8. All queries with text in WIN1251 are converted to UTF-8 without problems, since we've set the encoding for the database when creating the connection. However, any exceptions containing cyrillic characters raise decoding errors in Python.

We've held off on changing over to the new Python driver due to an issue with how BLOBs are handled and how that relates to the SQLAlchemy driver for Firebird.

I admit that we haven't tested whether this is actually the case, but having a look at the new driver's source code, it seems to also suffer from this issue, since it uses locale.getpreferredencoding() to determine how exceptions should be decoded.

The proposed solutions have some problems:

The solution we've found works best for our case is to use the same encoding as the connection to the database, since it's more likely that the database will also use that encoding for its exceptions.

This means that, in fdb/fbcore.py, we have to:

  1. add a new parameter to exception_from_status: encoding, and using it to decode the exception
    591c591
    < def exception_from_status(error, status, preamble=None):
    ---
    > def exception_from_status(error, status, preamble=None, encoding=None):
    607c607
    <                 msglist.append('- ' + (msg.value).decode(sys_encoding))
    ---
    >                 msglist.append("- " + (msg.value).decode(charset_map.get(encoding, encoding) or sys_encoding))
  2. find all the places where exception_from_status is called and provide a value for the new parameter

We do have a patch file for fixing this issue, which can be applied to fdb/fbcore.py. However I'm reluctant to turn it into a pull request, since we don't have any tests we can provide, and we aren't sure we found every place where this issue occurs.

pcisar commented 5 days ago

Well, the core of this problem is that there could be error messages that are encoded in OS encoding at the server (path, filenames etc.). In your case it happens to be the same as database encoding, so your solutions works fine for you, but fails for other cases. Hence I'm reluctant to adopt this approach. I agree that this should be configurable, best at connection level (both database and server). I'll see what I can do about that, but I'll first fix that in firebird-driver as it's more easy with its separate configuration scheme. I'll see if something could be done with FDB.