carderne / signal-export

Export your Signal chats to markdown files with attachments
Other
446 stars 47 forks source link

using the --old parameter seems to omit smaller messages. #24

Closed franklin-be closed 2 years ago

franklin-be commented 2 years ago

Hey there,

I noticed something and i think it's related to the --old parameter. It seems to omit smaller messages when trying to merge the changes.

I only noticed it because i use desktop client on multiple systems and was once hit with the dreaded "chat session refresh" which made that particular desktop client not receiving all the messages that were send/received on other devices due to it being offline. So now i have some messages only on desktop client X while they are missing on Desktop client Y

It's no biggy just something i wanted to bring up.

here's a screenshot i've taken with notepad++ using the compare plugin. It's exclusively small messages that are missing like just a smiley like ";)" or short words "lol", "haha" or a "yeah" probably somehow related to length.

On the right side you can see the diagram of the differences and it's exclusively small messages being omitted. (asid from that larger chunk but that was the reason for using the --old)

left side org export - right side export using --old ommit

regards

franklin

carderne commented 2 years ago

Just so I understand, these are small messages from the --old file that are not making it across to the output?

I can't reproduce this...

These are probably the culprit lines: https://github.com/carderne/signal-export/blob/8e119b1dbaa5e59312fd830f2252763092e61de9/sigexport.py#L427-L436

But the fallback when there's no match is to append to the previous message, so nothing should be completely lost... Maybe try drop some ifs and print in there checking for the strings that aren't showing up?

franklin-be commented 2 years ago

dang.. you're right that isn't the problem.

the problem is.. the desktop client i take my backup from is the one i'm usually not actively using most time.. So when i turn it on.. it receives all the previous missed messages in one go.. like from an entire evening. And they all end up with more or less the same timestamp.

So a chat over 4 hours for example ends up with timestamps with a difference of like 2 minutes.. because 2 minutes is all it took for that client to receive all messages that it did not have yet.

So messages with the same text body like "haha" which originally had a different timestamp for your script to differentiate end up looking the same because they all end up with the same timestamp due to it being the time the message was received on the desktop client.

to visualize this... check this export from 2 different signal desktop clients. (pay attention to the timestamps)

Left side is the desktop client who wasn't online while this conversation occurred. (it was turned on a couple of hours later at around 8:3x) Right side is the desktop client the chat took place in. (i left some bits of text for you to see it's the same)

timestamp

Sorry for sending you on a goose chase by not being more attentive ;(

so.. should i close this? Obviously your script works as intended but the way signal handles multiple clients is a a bit iffy..

UPDATE: Sorry.. that behavior isn't even consistent in my logs.. some timestamps are iffy while others where the same should have happend are not.. The problem must be on my end. sorry again

carderne commented 2 years ago

@franklin-be no worries. Is this fully Signal’s fault, i.e. do the messages appear out of order in the signal desktop client?

I wonder if we could use a more precise timestamp (or a different one; I forget exactly what the db schema is and I think it’s changed a few times) to correct the ordering issue (and potential overwriting issue but I’m not sure why that would happen exactly).

franklin-be commented 2 years ago

hey @carderne (sorry if that was an unnecessary ping i don't know how notifcations work here.) just wanted to make sure you'll get notification of this response after you asked me and i didn't get around to respond right away.

so they appear fine in the desktop client.. but the received stamp is hte one that is reflected in the index.md

the one marked with the red X is the one from the screenshow below stamp1

In my last response before closing the issue you can also see one message that ends with "life" it's this one.. and it seems the index.md is build based on the received timestamp. not the send timestamp. (because it says 08:37)

stamp1_message_detail