cschiller / zhongwen

Official source code of the "Zhongwen" Chrome extension
https://chrome.google.com/webstore/detail/zhongwen-chinese-english/kkmlkkjojmombglmlpbpapmhcaljjkde
GNU General Public License v2.0
312 stars 52 forks source link

Read the database #87

Closed thomashirtz closed 2 years ago

thomashirtz commented 2 years ago

Hello, I am trying to read the zhongwen database to make myself a small script to directly save the wordlist to my anki. However I have a small problem when I try to decode the entries:

import sqlite3
conn = sqlite3.connect('data.sqlite')

cur = conn.cursor()
sql_string = 'SELECT value FROM data WHERE key="wordlist";'
cur.execute(sql_string)
data = cur.fetchone()[0]
conn.close()

print(data)

I get the error:

Traceback (most recent call last):
  File "D:/Thomas/Python/zhongwen-anki/new2.py", line 6, in <module>
    cur.execute(sql_string)
sqlite3.OperationalError: Could not decode to UTF-8 column 'value' with text '���C[{"timestamp":1619465676371,"simplified":"������������"

Therefore I changes the conn.text_factory

import sqlite3
conn = sqlite3.connect('data.sqlite')
conn.text_factory = bytes

cur = conn.cursor()
sql_string = 'SELECT value FROM data WHERE key="wordlist";'
cur.execute(sql_string)
data = cur.fetchone()[0]
conn.close()

print(data)
string = data.decode('UTF-8')
print(string)

I got bytes that looks like this :

b'\xb9\xf5\x05\xf0C[{"timestamp":1619465676371,"simplified":"\xe4\xb8\x80\xe5\x8f\xb6\xe9\x9a\x9c\xe7\x9b\xae","traditional\t\x1d\x0c\xe8\x91\x89\xe9\x11\x1d\x98pinyin":"yi\xcc\x84 ye\xcc\x80 zha\xcc\x80ng m ...

I tried all the solutions from SO like those: https://stackoverflow.com/questions/22751363/sqlite3-operationalerror-could-not-decode-to-utf-8-column/43711347

Also I tried to try all the decoding method like with this post: https://stackoverflow.com/a/68944380/9548111

I tried to decode the raw string using many different way but no success. Is it possible to tell me how did you "encode" it ?