apicrafter / metacrafter

Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Apache License 2.0
44 stars 5 forks source link

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) Could not decode to UTF-8 column #17

Open ivbeg opened 2 years ago

ivbeg commented 2 years ago

Error processing SQLite database with non-unicode names for fields. Example 000012_world.zip

`Traceback (most recent call last): File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\engine\result.py", line 1284, in fetchall l = self.process_rows(self._fetchall_impl()) File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\engine\result.py", line 1230, in _fetchall_impl return self.cursor.fetchall() sqlite3.OperationalError: Could not decode to UTF-8 column 'name' with text '\ufffdland Islands'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Program Files\Python310\Scripts\metacrafter-script.py", line 33, in sys.exit(load_entry_point('metacrafter==0.0.2', 'console_scripts', 'metacrafter')()) File "C:\Program Files\Python310\lib\site-packages\metacrafter-0.0.2-py3.10.egg\metacrafter__main__.py", line 12, in main exit_status = cli() File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1130, in call return self.main(args, kwargs) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Program Files\Python310\lib\site-packages\click\core.py", line 760, in invoke return __callback(args, **kwargs) File "C:\Program Files\Python310\lib\site-packages\metacrafter-0.0.2-py3.10.egg\metacrafter\core.py", line 464, in scan_db acmd.scan_db( File "C:\Program Files\Python310\lib\site-packages\metacrafter-0.0.2-py3.10.egg\metacrafter\core.py", line 359, in scan_db items = [dict(u) for u in queryres.fetchall()] File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\engine\result.py", line 1288, in fetchall self.connection._handle_dbapi_exception( File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\engine\base.py", line 1510, in _handle_dbapiexception util.raise( File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\util\compat.py", line 182, in raise_ raise exception File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\engine\result.py", line 1284, in fetchall l = self.process_rows(self._fetchall_impl()) File "C:\Users\ibegt\AppData\Roaming\Python\Python310\site-packages\sqlalchemy\engine\result.py", line 1230, in _fetchall_impl return self.cursor.fetchall() sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) Could not decode to UTF-8 column 'name' with text '\ufffdland Islands' (Background on this error at: http://sqlalche.me/e/13/e3q8) `

ivbeg commented 2 years ago

It's caused by column names with non utf-8 encoding, it's possible with some SQLite databases. Possible solution here https://stackoverflow.com/questions/22751363/sqlite3-operationalerror-could-not-decode-to-utf-8-column it could resolve errors by it will improve data types detection without knowledge of encoding used. Need further research.