Closed da5nsy closed 1 year ago
I need a little bit more of the error trace, specifically the part that points out on which line of sigexport/main.py
the error happened.
🤔
I don't think I have that - above the paginate=100
line (top line in the screenshot) there are just hundreds of lines of data.
Any pointers?
Ok so how signal-export
works, assuming you used the Docker method, is runs the extract inside docker, dumps out the data as a JSON string, and then does stuff with the data outside Docker.
When it loads the text, it is trying to do so with cp1252
encoding (Windows US default, as far as I can tell), whereas it should be using utf-8
.
Are you able to edit the file sigexport/main.py
on your computer? It's probably somewhere near this file: C:\Users\cege-user\miniconda3\envs\signal_backup\lib\encodings\cp1252.py
. If so edit this line: https://github.com/carderne/signal-export/blob/15b892833599043bf068166b372f0bcbbd4af396/sigexport/main.py#L519
As follows:
p = subprocess.run(cmd, capture_output=True, text=True, check=True)
- data = json.loads(p.stdout)
+ data = json.loads(p.stdout.encode("utf-8"))
convos, contacts = data["convos"], data["contacts"]
And see if that fixes the problem!
Thank you @carderne!
For our future reference, the file was at C:\Users\cege-user\miniconda3\envs\signal_backup\Lib\site-packages\sigexport\main.py
I changed the line, but unfortunately, the result was the same.
I have the same issue. I'm also on Windows (10), using Docker through WSL (in case it matters).
The failure first happens in
main.py:571 in
│ main
│
│ 568 │ secho("Creating markdown files")
│ 569 │ for md_path, md_text in create_markdown(dest, convos, contacts, quote):
│ 570 │ │ with md_path.open("a") as md_file:
│ ❱ 571 │ │ │ print(md_text, file=md_file)
I've changed the encoding for open()
to "utf-8"
.
That fixed that, but it next breaks in
main.py:266 in
│ create_html
│
│ 263 │ │ │ # touch first
│ 264 │ │ │ open(path, "a")
│ 265 │ │ │ with path.open() as f:
│ ❱ 266 │ │ │ │ lines_raw = f.readlines()
After adding UTF-8 encoding to both of these lines as well, the next time it breaks in
main.py:582 in
│ main
│
│ 579 │ │ │ paginate = int(1e20)
│ 580 │ │ for ht_path, ht_text in create_html(dest, msgs_per_page=paginate):
│ 581 │ │ │ with ht_path.open("w") as ht_file:
│ ❱ 582 │ │ │ │ print(ht_text, file=ht_file)
After that, it works.
The output seems to be correct, too. I checked the place in my chats where it first breaks, and it seems it's either an ö
(German umlaut) or an emoji that comes a couple characters after that.
So every place in the code where Python tries to use the cp1252.py
encoder, it breaks on the first "special" character, it seems.
Shouldn't the encoding set in the code be UTF-8
, by default? Why use any other encoding?
Is there a reason to not just change it to that?
Hi @Blacklands thanks for looking into that. I haven't changed anything yet because I wasn't yet sure if this would fix it, and I'm generally unsure how encoding stuff works on Windows.
Could you please submit a PR with the changes you made?
Desktop (please complete the following information):
Describe the bug Undefined character causes and error.