Closed ibsusu closed 3 years ago
Just ran into this myself, it would be great if there was some option to fix this! Perhaps an option to print the error message in the cell (in red, so it's clear it's not that literal text?)
I'm working with the Firefox history database, so sadly removing the malformed data is not an option :(
Do you have an example value that I could use to reproduce this?
I uploaded an example database file here: https://hack.wesleyac.com/test.sqlite
Using the invalid unicode value \xc3\x28
. Let me know if that's sufficient for you :)
Thank you @WesleyAC. I was able to reproduce the issue. The fix is now in a PR (pending review from other core devs).
Long form description of what is going on:
Turns out sqlite3 library for Python uses utf-8 by default which works fine since Sqlite3 stores everything as utf-8. But as you pointed out there could be invalid unicode values that can sneak in. Thankfully the python library allows overriding of the decoder that can be used. So I've caught the exception and applied latin-1 decoding. Unfortunately this is a batch process which means, if a single value has an invalid byte value, the whole set has to use the fallback encoding of latin-1.
It seems to work well for now, but I can't use it to highlight the invalid value in red.
Unfortunately this is a batch process which means, if a single value has an invalid byte value, the whole set has to use the fallback encoding of latin-1.
Seems we can use decode('utf-8', 'backslashreplace')
to avoid this issue:
>>> b'\xf0\x9f\x98\x8a\x80abc'.decode('utf-8', 'backslashreplace')
'😊\\x80abc'
>>> b'\xf0\x9f\x98\x8a\x80abc'.decode('latin-1')
'ð\x9f\x98\x8a\x80abc'
I just dived into this issue a little, the root cause of this is:
TEXT
type, but SQLite does not check if it's a valid UTF-8 string when inserting to it.UnicodeDecodeError: 'utf-8' codec can't decode byte ...
error.@amjith's CR fixed this issue by catching the UnicodeDecodeError
and then try to decode it as latin-1
.
I don't know how to solve this. If there is a single record with a non-convertible column it won't print anything.
Could not decode to UTF-8 column 'verifier' with text '
'���U'` sqlite3 printsSeeing this was actually helpful because it notified me that I had garbage data, but I still would've thought the other rows would print.