chrrel / whatsapp-exporter

A python script for extracting WhatsApp conversations from the app's SQLite database and exporting them as HTML or txt files.
GNU General Public License v3.0
93 stars 18 forks source link

Could not decode to UTF-8 column 'text' with text '�������' #18

Closed franzos closed 6 months ago

franzos commented 8 months ago

Hi there, I'm facing an issue with exports:

$ python3 main.py 
### WhatsApp Database Exporter ###
[+] Reading Database
[+] Using table 'message'
Traceback (most recent call last):
  File "/home/franz/playground/whatsapp-export/whatsapp-exporter/whatsapp-exporter/main.py", line 128, in <module>
    main()
  File "/home/franz/playground/whatsapp-export/whatsapp-exporter/whatsapp-exporter/main.py", line 116, in main
    chats = query_all_chats(config["input"].get("msgstore_path"), contacts)
  File "/home/franz/playground/whatsapp-export/whatsapp-exporter/whatsapp-exporter/main.py", line 83, in query_all_chats
    messages = query_messages_from_table_message(con, key_remote_jid, contacts)
  File "/home/franz/playground/whatsapp-export/whatsapp-exporter/whatsapp-exporter/main.py", line 61, in query_messages_from_table_message
    for timestamp, remote_jid, from_me, data, message_type, latitude, longitude, media_path in cur.execute(query, {"key_remote_jid": key_remote_jid}):
sqlite3.OperationalError: Could not decode to UTF-8 column 'text' with text '�������'

I had a quick look over the data already, but I could not find the offending field yet.

chrrel commented 8 months ago

Hi, obviously, I cannot reproduce your issue without your database. You could test this solution from StackOverflow. To do so, add the following code in line 33 of main.py, i.e. before cur = con.cursor():

 con.text_factory = lambda b: b.decode(errors="ignore")

A disadvantage could be that you would loose some data in the export as this probably fails silently.

Edit: Probably, it makes sense to change the error handler from ignore to something like surrogateescape or `replace, see the Python docs for details. Then the line would be:

 con.text_factory = lambda b: b.decode(errors="surrogateescape")
chrrel commented 6 months ago

Closed due to inactivity.