Closed huyz closed 8 months ago
Thanks for raising this, it's a really good point. I'll probably put this behind a flag for now, as it's quite a big change to the output...
Have just pushed the needed change to main: https://github.com/carderne/signal-export/commit/4816f1b3480a5785b9fddfa7b44841333ee3f755
Haven't yet released (i.e. won't be available to pip install
yet.
Will get to it in the next day or two.
I've just released a new untested version.
if you pip install signal-export==1.8.0
and use the flag --newlines
you can give it a whirl
Great, thanks for responding so quickly!
But I have a weird issue now. As soon as I update signal-export
as above, I get this error now:
Using Docker to extract data, this may take a while the first time!
Docker process failed, see logs below:
Command '['docker', 'run', '--rm', '--volume=/Users/huyz/Library/Application Support/Signal:/Signal', 'carderne/sigexport:v1.8.0', '--no-use-docker', '--print-data', '--include-empty']' returned non-zero exit status 125.
It used to say:
Copying and renaming attachments
Creating markdown files
Merging old at snapshot.2024-02-03T151753Z into output directory
No existing files will be deleted or overwritten!
Done!
My installation shouldn't be using Docker, so I don't know why this happens as soon as upgrade signal-export
Nothing should have changed whether or how Docker is called, are you sure it wasn't using it before?
You can try downgrade to an earlier version pip install signal-export==1.7.1
or I just pushed a version that will provide more details about why Docker failed pip install signal-export==1.8.1
(possibly the Docker daemon isn't running.)
Oh I was upgrading from 1.6.0 and I see that you've changed the default to --use-docker
. I just needed to add --no-use-docker
to my invocation. I was confused for a while because I didn't remember ever setting Docker to be used.
I'm still checking it out. I'm a bit confused right now cause some messages are repeated out of order. I think this is Signal's fault, though as I've noticed similar weird things. Bear with me…
About the --no-use-docker
: I changed it because on Linux/Windows, it's not straightforward to get it to work properly.
Did you find it simple on macos to just brew install openssl sqlcipher
and then the script worked?
Maybe I should add a check like if on macos and openssl/sqlcipher binaries available -> skip docker...
Did you find it simple on macos to just brew install openssl sqlcipher and then the script worked?
Yep it worked just fine. I don't need Docker at all.
I wrote:
I'm still checking it out. I'm a bit confused right now cause some messages are repeated out of order. I think this is Signal's fault, though as I've noticed similar weird things.
Actually, it looks like, because I'm running signal-export
with --old
and with --newlines
flag, the old messages aren't recognized so that's why I had everything in duplicate.
Not a big deal if this is fixed, as this is a one-time issue which wouldn't affect anyone running --newlines
from the get-go.
The current changes here are already good enough for me, as I can clean up the merged output manually one time.
Thanks for your help. Feel free to close this issue.
I think this flag should be the default.
Just need to have a look at @huyz issue around merging with old, then will make it default.
For those who need to retroactively fix the markdown chat history, you can either:
--old
flagFor the latter, I wrote a perl script to fix things: https://gist.github.com/huyz/f8af7941ffc1c961f325f204b8367017
Just edit with the timestamp of the second message that uses the new format (a format I call number v2
) after applying https://github.com/carderne/signal-export/commit/4816f1b3480a5785b9fddfa7b44841333ee3f755 and run it on the index.md
in the current file.
This does 2 things on all the events in format 1:
It will not touch events in format v2 (the ones that start with the date that you've configured above) yet. So those untouched events will have correct newlines in between events but the blockquotes will still be wrong. Once the fix for https://github.com/carderne/signal-export/commit/4816f1b3480a5785b9fddfa7b44841333ee3f755#r139671078 is in, we'll have a format v3 defined. At that point, I'll update the above perl script to fix the blockquotes from in the events in format v2 so that everything is in format v3.
@carderne
Just need to have a look at @huyz issue around merging with old, then will make it default.
This issue is pretty clear after I looked at this line of code, which handles the merging and deduplication: https://github.com/carderne/signal-export/blob/649f9f54e98ef1679a99d0fff2bfbb939b1cfeb3/sigexport/main.py#L437
This confirms that the de-duplication requires messages to match exactly. So whenever the output format of signal-export changes, this creates problems as duplicates of those specific messages whose format has changed won't be detected anymore.
Now, if one has the entire chat history still in the Signal DB (e.g., you don't make use of Disappearing messages
for any conversation), then it's simple: just don't run with --old
and run with --newlines
(after https://github.com/carderne/signal-export/pull/112 is released, which I call format v3). And you're done.
If however one or more of your chats uses the Disappearing messages
setting and Signal has already auto-disappeared some messages from the local DB, then you have no choice but to try to reuse the existing exports.
To handle the fact that we now have 3 export formats (original v1, v2 with newlines but incorrect blockquoting, and v3 with both newlines and corrected blockquoting), I updated my perl script https://gist.github.com/huyz/f8af7941ffc1c961f325f204b8367017 to handle all cases (as much as possible). This should actually be run as early as possible, before the next signal export is invoked with --old
. This script will reformat old messages in index.md
(whether in v1 or v2) so that they're compatible with v3. That way signal-export will properly detect duplicates on the next run.
Note that there is a bit of ambiguity with the old messages as we can't always be 100% sure when a blockquote actually ends in formats v1 and v2. But the perl script should work in most, if not all, cases. You'd know if you there are duplicates if you see some messages that are seemingly out of date order. (I actually wrote some script to extracts timestamps and plotted them on observablehq.com to visually check that the dates were monotonously increasing—that gave me a reasonable idea that there aren't any more spurious duplicates; i.e. that the retroactive markdown fixes match the new format v3).
Hope this helps for anyone looking to transition.
Perhaps if someone doesn't use the --old
flag, then --newlines
can become the default. Backward compatibility of the format only makes sense for --old
.
Oh btw, since the format of blockquote has changed, regardless of the --newlines
flag, back in v1.8.0, people who have been using --old
will most likely already have duplicate issues.
Hmm I wonder what a long term solution for handling new format changes is. Not an easy problem to solve.
Hey @huyz thanks for all the input.
I've updated the merge function so it should handle differences in formats (for now basically just ignore \n
and >
symbols) and I've made the extra newlines default.
This is all currently in a pre-release version, would be great if you can give it a whirl!
pip install signal-export>=2.0.0
If I don't hear from you I'll assume this is all sorted and close the issue, and then I'll release 2.0.0 so everyone gets the change.
Desktop (please complete the following information):
Describe the bug
Because there's no newline after each entry or after a blockquote, when viewing the
index.md
in common Markdown viewers, there's too much ambiguity and different renderers will assume that all the lines are part of the same block.could be rendered as (standard Markdown rules):
or as (if the renderer is configured to render all new lines as
<br>
inside paragraphs)when the markdown could be made less ambiguous by adding newlines:
so that they are rendered either as (standard Markdown rules):
or (if the renderer is configured to render all new lines as
<br>
inside paragraphs)Here is the relevant style guide rule: https://www.markdownguide.org/basic-syntax/#blockquotes-best-practices
To reproduce Steps to reproduce the behavior. Please include the exact commands tried.