coleifer / sqlite-web

Web-based SQLite database browser written in Python
MIT License
3.43k stars 339 forks source link

sqlite3.OperationalError: Could not decode to UTF-8 column #160

Closed bertsky closed 5 months ago

bertsky commented 5 months ago

I am trying to view sqlite3 db files generated by GNU parallel. Unfortunately, sqlite_web crashes like this:

  File "/usr/local/lib/python3.7/site-packages/sqlite_web/templates/table_content.html", line 39, in block "inner_content"
    {% for row in query %}
  File "/usr/local/lib/python3.7/site-packages/peewee.py", line 4329, in next
    self.cursor_wrapper.iterate()
  File "/usr/local/lib/python3.7/site-packages/peewee.py", line 4239, in iterate
    row = self.cursor.fetchone()
sqlite3.OperationalError: Could not decode to UTF-8 column 'Command' with text '. ~/.bash_profile� ;� make� ...

The special character you are seeing here is valid UTF-8, though: it's for 0xFFFD (REPLACEMENT CHARACTER).

I don't know why, but the developer of parallel decided to use that for splitting the string in one of the TEXT columns: https://git.savannah.gnu.org/cgit/parallel.git/tree/src/parallel#n15088

Obviously, I cannot change the code generating the database. Updating via SQL is also impossible due to locking.

Is there anything I can do to make it work on sqlite_web's or peewee's side?

coleifer commented 5 months ago

This error is being generated by the Sqlite driver, not peewee or sqlite-web. When I attempted to mimic the functionality (as I best understood it), I think the driver may be reporting correctly. Consider:

open my $fh, '>', $ENV{HOME} . "/out.txt"
    or die "Can't open $fname";
say $fh "Test\257 test";
close $fh;

This produces the following:

with open('out.txt', 'rb') as fh:
    data = fh.read()
print(repr(data))
# b'Test\xaf test\n'

data.decode('utf8')
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaf in position 4: invalid start byte

I'd suggest the issue is likely the data is being stored in a TEXT column when it should be stored as a BLOB.

You can try creating a VIEW in your parallel database that replaces the problematic characters.

bertsky commented 5 months ago

Oh, indeed! This is invalid UTF-8.

Unfortunately, I cannot even create a view from the table, because it is constantly locked while parallel runs jobs (so I have to open the database with sqlite3's URL notation, passing ?immutable=1&mode=ro).

I'll open an issue with parallel.

Thanks for your quick analysis!