FirebirdSQL / fdb

Firebird Driver for Python
https://www.firebirdsql.org/en/devel-python-driver/
Other
60 stars 26 forks source link

Firebird database with malformed strings cannot be accessed [PYFB76] #91

Open firebird-automations opened 5 years ago

firebird-automations commented 5 years ago

Submitted by: Joern Ungermann (joernu76)

We have a firebird database containing a few malformed ASCII strings. These are byte streams, presumably ASCII coded, stored as is from a rather unreliable wireless transmission. Thus, there are bit flips, which causes the strings to contain some strange characters. Using python 2, we had no problem accessing such databases. Using python 3, everything needs to be decoded/encoded. Using the default access to the database using sqlalchemy and fdb causes a UnicodeDecodingError upon accessing an entry with a malformed string:

[...] File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 2659, in __xsqlda2tuple value = b2u(value, self.__python_charset) File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 480, in b2u return st.decode(charset) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 1: invalid start byte

Playing around with the "charset" option of sqlalchemy could open some files, where the malformed strings contained only values available in the given charset, but introduced other problems. Looking into the code, there seems to be an "OCTETS" option, which supposedly parse the bytestream through, but fails to work (the python-translated-charset is None, which is not a valid option for encode/decode)

I'd prefer either of two options: a) implement (optionally?) an option to influence the error behavious of decoding (either replace or ignore would suit our usecase) b) implement a pass-through option for the byte-stream

If I oversaw some option to solve my use-case wth the given code-base, I'd be very happy to hear about it.

firebird-automations commented 5 years ago

Commented by: @pcisar

It's strange, because client charset OCTETS should work as pass-through for all strings as b2u() returns the string (bytes in this case) unchanged when passed charset is None (which should be according to ibase.charset_map). But P3 application or upper layer has to be prepared to handle bytes instead unicode string on output.

Anyway, this is tricky problem. I don't like the idea to hide decode errors (especially if it's easy to use this option on global scale) because then such errors can slip unnoticed to cause problems elsewhere, that are then much harder to detect. A pass-through solution that would return byte-stream instead unicode when error is encountered is the same problem, as it means that users must expect various result types.

But optional connection option (disabled by default) for decoding error resolution appears to be good enough solution, and I'll implement it for next release. However, usability depends on how this would be exposed by other layers like sqlalchemy.

As a hotfix, you can change the b2u() function in http://fbcore.py to replace or ignore bad characters.

firebird-automations commented 5 years ago

Commented by: Joern Ungermann (joernu76)

Thanks, solving this would be marvelous, but I do not see an easy solution for this including all the stacks, either. The quick fix works as intended, but doesn't scale very well, as it breaks our automatic deployment. We really should switch to docker or somesuch.

If I try to use the charset=OCTETS option in sqlalchemy to bypass the coding, I get the following error:

File "/home/icg173/bin/gloripy_V01781_R7633e89/gloripy/data/database.py", line 792, in __init__ self.metadata.reflect(self.engine) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/schema.py", line 3908, in reflect with bind.connect() as conn: File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2102, in connect return self._connection_cls(self, **kwargs) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 90, in __init__ if connection is not None else engine.raw_connection() File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2188, in raw_connection self.pool.unique_connection, _connection) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect return fn() File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 342, in unique_connection return _ConnectionFairy._checkout(self) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 788, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 529, in checkout rec = pool._do_get() File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1193, in _do_get self._dec_overflow() File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__ compat.reraise(exc_type, exc_value, exc_tb) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise raise value File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1190, in _do_get return self._create_connection() File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 347, in _create_connection return _ConnectionRecord(self) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 474, in __init__ self.__connect(first_connect_check=True) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/pool.py", line 671, in __connect connection = pool._invoke_creator(self) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 106, in connect return dialect.connect(*cargs, **cparams) File "/home/icg173/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 412, in connect return self.dbapi.connect(*cargs, **cparams) File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 827, in connect no_reserve, db_key_scope, no_gc, no_db_triggers, no_linger) File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 760, in build_dpb dpb.add_string_parameter(isc_dpb_user_name, user) File "/home/icg173/anaconda3/lib/python3.6/site-packages/fdb/fbcore.py", line 625, in add_string_parameter value = value.encode(charset_map.get(self.charset, self.charset)) TypeError: encode() argument 1 must be str, not None

The "OCTETS" charset from "firebird+fdb://{}:{}@{}/{}?charset=OCTETS" is translated to "None" in fdb.ibase.charset_map. The None is then passed as argument to the encode method, which doesn't like it. In this case, the value should be directly returned as bytes object? I am not sure what fdbcore is doing there, but it would certainly break later as well, when None is passed to the decode function, which, probably, should be bypassed as well in the case of OCTETS?

firebird-automations commented 5 years ago

Commented by: @pcisar

I see, FDB clearly does not handle OCTETS properly. Using it as client charset is very wild idea that should nobody even try, but anyway. The problem with OCTESTS handling is that it adds another level of complexity to the code that is already trying to handle P2/P3 string difference in split-brain fashion (i,.e. P3 support in FDB is somewhat crude and unclean thanks to its history coming back to early P3 era). I worry that it would not get better anytime soon (it's planned as part of transition from P2 centered to P3 centered development of FDB over 2.x series).

Ok, I'll try to add optional support for replace/ignore option into bytes2unicode conversions in FDB as soon as possible. However, it happens at several places in various contexts, so it may take some time as I'd like do it properly.