Closed bentolor closed 3 months ago
Thank you so much for your detailed report. I will look into this shortly.
Change the corresponding code to not loop indefinitely on an failure in a single row, but instead fail or skip gracefully
That was certainly the intention, but I screwed up the logic. This will be quite easy to fix I believe.
If possible identify the root cause for this duplicate. I'm happy to provide additional debugging SQL statement outputs if that helps.
Yes, that is the real problem. I might indeed need some more output from you, however looking into the code and the information you already provided I have a theory. (I'm thinking the special circumstance is the message is an extremely long one (one that is not displayed fully, but says "Read more..." at the bottom) and has another such extremely long message as a quote)
I will be working on this now, it shouldn't take too long, I'll keep you updated. Thanks!
Hey @bepaald ! Thank you for your quick and supportive response!
(I'm thinking the special circumstance is the message is an extremely long one (one that is not displayed fully, but says "Read more..." at the bottom) and has another such extremely long message as a quote)
I can confirm your guess: Indeed this is a very long message and the message text is cut-off. I couldn't find a quote of that message in that thread. But I think I forwarded the message to multiple other chats. Could this be the culprit?. Interesting edge-case.
Meanwhile I'm already more than happy, as the output I received w/o the --searchpage
argument already provides me more value than ever. First time a fully accessible message archive on desktop.
I've attempted a fix.
First of all, I fixed the logic, so if the query still fails, you should only see the warning (search_idx query failed 2 results
) once, but the process should continue properly.
Second, I think I fixed the root cause (simply by adding another constraint to the query (AND attachment.quote = 0
).
If you have some time, please test and let me know.
Thanks again for reporting!
EDIT Haha, we were typing simultaneously it seems.
I can confirm your guess: Indeed this is a very long message and the message text is cut-off. I couldn't find a quote of that message in that thread. But I think I forwarded the message to multiple other chats. Could this be the culprit?. Interesting edge-case.
Good, that makes it more likely it is now fixed. Though I hadn't considered the quoted message might not exist anymore. I don't think it makes a difference in this case though.
Meanwhile I'm already more than happy, as the output I received w/o the --searchpage argument already provides me more value than ever. First time a fully accessible message archive on desktop.
Thank you for your kind words. And thanks to bug reports like these, hopefully the program slowly gets better over time!
I've attempted a fix.
First of all, I fixed the logic, so if the query still fails, you should only see the warning (
search_idx query failed 2 results
) once, but the process should continue properly.
It seems that part is now working as expected!
*** Starting log: 2024-06-08 14:00:32 ***
signalbackup-tools source version 20240608.134549
BACKUPFILE VERSION: 1
BACKUPFILE SIZE: 5697670524
COUNTER: 4267989572
Reading backup file: 100.0%... done!
Database version: 231
Clearing contents of directory 'html'...
Dealing with thread 1
Dealing with thread 2
[Warning]: search_idx query failed 2 results
Dealing with thread 3
Second, I think I fixed the root cause (simply by adding another constraint to the query (
AND attachment.quote = 0
).
Obviously it seems to have a different root cause. While I'm typing this, the export is still running and I an additional warning scrolling by:
Dealing with thread 25
Dealing with thread 26
[Warning]: search_idx query failed 3 results
Dealing with thread 27
Hmmmm…
Obviously it seems to have a different root cause. While I'm typing this, the export is still running and I an additional warning scrolling by:
Ok, that's a shame. In that case, I'm not sure what is causing the extra results in this case (the query should only have 1 result, hence the warning). Maybe you could run a query for me?
SELECT _id,message_id,content_type,remote_incremental_digest,remote_incremental_digest_chunk_size,cdn_number,transfer_state,transfer_file,data_size,file_name,fast_preflight_id,voice_note,borderless,video_gif,quote,width,height,caption,sticker_pack_id,sticker_pack_key,sticker_id,sticker_emoji,blur_hash,transform_properties,display_order,upload_timestamp,archive_cdn,archive_media_name,archive_media_id,archive_transfer_file,archive_transfer_state,thumbnail_file,thumbnail_random,archive_thumbnail_cdn,archive_thumbnail_media_id FROM attachment WHERE message_id = 7280;
It's probably a but overkill, but it might save me from having to ask again if I miss something. Also, if you're able to identify the other message (the one in thread 26), you could run the same query with 7280
replaced with that message's _id
. It might not be necessary (it might simply be the same issue), but since it's returning 3 results instead of 2, it might be something else.
Thanks!
Here we go. The other hit was the subsequent message. Seems to seconds my assumption of a cross-thread forwarded message?
sqlite3 plain/database.sqlite Sa 08 Jun 2024 14:43:01 CEST
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
sqlite> SELECT _id,message_id,content_type,remote_incremental_digest,remote_incremental_digest_chunk_size,cdn_number,transfer_state,transfer_file,data_size,file_name,fast_preflight_id,voice_note,borderless,video_gif,quote,width,height,caption,sticker_pack_id,sticker_pack_key,sticker_id,sticker_emoji,blur_hash,transform_properties,display_order,upload_timestamp,archive_cdn,archive_media_name,archive_media_id,archive_transfer_file,archive_transfer_state,thumbnail_file,thumbnail_random,archive_thumbnail_cdn,archive_thumbnail_media_id FROM attachment WHERE message_id = 7280;
4769|7280|text/x-signal-plain||0|2|0||2241|signal-2021-04-10-190557.txt|-2550449821634623864|0|0|0|0|0|0||||-1|||{"skipTransform":true,"videoTrim":false,"videoTrimStartTimeUs":0,"videoTrimEndTimeUs":0,"videoEdited":false}|0|1618074359201|0||||0|||0|
4770|7280|text/x-signal-plain||0|2|0||2241|signal-2021-04-10-190557.txt|-2550449821634623864|0|0|0|0|0|0||||-1|||{"skipTransform":true,"videoTrim":false,"videoTrimStartTimeUs":0,"videoTrimEndTimeUs":0,"videoEdited":false}|0|1618074359201|0||||0|||0|
sqlite> SELECT _id,message_id,content_type,remote_incremental_digest,remote_incremental_digest_chunk_size,cdn_number,transfer_state,transfer_file,data_size,file_name,fast_preflight_id,voice_note,borderless,video_gif,quote,width,height,caption,sticker_pack_id,sticker_pack_key,sticker_id,sticker_emoji,blur_hash,transform_properties,display_order,upload_timestamp,archive_cdn,archive_media_name,archive_media_id,archive_transfer_file,archive_transfer_state,thumbnail_file,thumbnail_random,archive_thumbnail_cdn,archive_thumbnail_media_id FROM attachment WHERE message_id = 7281;
4771|7281|text/x-signal-plain||0|2|0||2241|signal-2021-04-10-190557.txt|-2550449821634623864|0|0|0|0|0|0||||-1|||{"skipTransform":true,"videoTrim":false,"videoTrimStartTimeUs":0,"videoTrimEndTimeUs":0,"videoEdited":false}|0|1618074359201|0||||0|||0|
4772|7281|text/x-signal-plain||0|2|0||2241|signal-2021-04-10-190557.txt|-2550449821634623864|0|0|0|0|0|0||||-1|||{"skipTransform":true,"videoTrim":false,"videoTrimStartTimeUs":0,"videoTrimEndTimeUs":0,"videoEdited":false}|0|1618074359201|0||||0|||0|
4773|7281|text/x-signal-plain||0|2|0||2241|signal-2021-04-10-190558.txt|-2550449821634623864|0|0|0|0|0|0||||-1|||{"skipTransform":true,"videoTrim":false,"videoTrimStartTimeUs":0,"videoTrimEndTimeUs":0,"videoEdited":false}|0|1618074359201|0||||0|||0|
sqlite>
BTW: In the signalbackup Output the message is rendered with an attachment. But not in the Signal app itself. Compare the two screenshots below:
Hej - wow? Do you even adopt the configured chat colors? Nice!
Thanks!
I was able to reproduce the issue by forwarding a long text to several contacts.
I strongly feel you have discovered a bug in Signal. With each additional contact a long message is forwarded to, an additional copy of the text is added to the message (note: these long messages are actually treated as attachments by Signal). I forwarded a long message to three contacts, and upon export I see this:
[...]
Dealing with thread 6
[Warning]: search_idx query failed 4 results
[...]
Dealing with thread 9
[Warning]: search_idx query failed 3 results
[...]
Dealing with thread 31
[Warning]: search_idx query failed 2 results
I will need to think about how to deal with it, I could very simply ignore any extra attachments of this type (text/x-signal-plain
) as any message should only ever have 1 such attachment at most. But I need to think about whether it has any unwanted side effects in other circumstances. Not sure how quickly I get around to it, but I'll keep you informed.
I will at least also file a bug report at Signal, but in my recent experience they don't often respond very quickly.
Hej - wow? Do you even adopt the configured chat colors? Nice!
I try, but I think this is mostly coincidence: it doesn't often work. Maybe I'll dive into that when I have some time to spare.
Thanks again. I'll keep you updated.
EDIT bug report here: https://github.com/signalapp/Signal-Android/issues/13593
Thanks for the research and very interesting analysis results!
I'm puzzled that this bug still seems to be present in a current Signal app installation and wonder why it does not render as attachment there. This is probably a rare border case.
I'm not sure if it really should be tackled as signalbackup-tool. Unfortunately I share the experience that Signal bug reports rarely get any feedback.
I'm good for now. Again: Thank you very much for your support and I'll also subscribe to your new upstream bug out of pure curiosity.
I'm puzzled that this bug still seems to be present in a current Signal app installation and wonder why it does not render as attachment there. This is probably a rare border case.
Yes, I think so, it only happens with long text messages that are forwarded to more than 1 contact. That is probably very rare. But even more important, it seems to have no user facing visual effect in Signal. The only way to see this bug is to inspect the database. So that is probably why nobody has found it before.
I'm not sure if it really should be tackled as signalbackup-tool. Unfortunately I share the experience that Signal bug reports rarely get any feedback.
I'm good for now. Again: Thank you very much for your support and I'll also subscribe to your new upstream bug out of pure curiosity.
I think I worked around this bug fairly well now. Building the search_idx will now only fail (gracefully) if the unexpected number of results is zero. In other cases it will simply use the first result, though it will still print a warning if there is more than 1. In the HTML output (and TXT output) I'm now simply skipping all text/x-signal-plain
type files during attachment handling.
If you could test again (hopefully for the last time), that would be very much appreciated. What I expect:
[Warning]: Unexpected number of results from search_idx query (X results, using first)
)searchpage.html
with the proper search terms, only the original and possibly the first forwarded messages would show up. Now, I expect all forwarded long messages to appear in the results.If you could confirm this now works as expected, I'd be happy to close this and just let the Signal developers deal with the bug if/when they can.
Thanks!
Hey @bepaald : I can confirm all of the expected outcomes. So many thanks for your love and support!
Thanks again for all your quick and useful feedback!
Hej cool! They fixed it quite promptly! Kudos for your 1st class upstream bug report!
Yeah, I'm pleasantly surprised. I guess they probably read all the issues even if it takes them too much time to respond.
When this hits a beta I'll test it, but I'm sure it's probably fixed now. Thanks!
First of all many thanks for this invaluable tool! I am very attached to my old conversations out of personal nostalgia and after a 4-year break I would like to export again my Signal collection spanning over many years for long-term archiving purposes. After revisiting all the other, mostly unmaintained alternatives like
signal-back
,sigbak
and others, I stumbled over this gem.Issue
I initially started my very first
--exporthtml
with all available bells'n'whistles including--searchpage
running into this issue already on a very first thread. The export never stops. The affected HTML file grows and grows with the same message repeated over and over again.Trying to identify the issue
I'm a total C++ noob, but eventually I was able to spot the location and replicate the relevant SQL query executed into an additional
Logger::warning
issue.This is the output on the relevant query
My understanding
I'm not sure what actually causes this issue leading to a duplicate query result. But I'd like to suggest to
Again: Thank you very much for your great work and this awesome tool!