eyalroz / removedupes

Remove Duplicate Messages
https://addons.thunderbird.net/en-US/thunderbird/addon/removedupes/
Other
87 stars 6 forks source link

Re-downloaded copies of same (IMAP) messages are found by removedupes to have different bodies #191

Open 5TjpWBU2wkwHpFDb opened 1 year ago

5TjpWBU2wkwHpFDb commented 1 year ago

I double-downloaded from an IMAP account to a local folder. So, many guaranteed dups. I even manually verified a few messages/files. No joy here: "No duplicates found." Troubleshooting_Information_2023-09-21.txt Additonal info (from Error Console): "Error: utils-message.js:2:1"

Thanks.

eyalroz commented 1 year ago

Please check what happens if you remove "Number of lines" from the comparison criteria. Thunderbird has this problem where it sometimes appends 1-2 empty lines to the end of the message, when storing it.

Once you've removed this (and maybe other) comparison criteria and start seeing dupes, you again add criteria until you get the exact kinds of dupes you're interested in.

5TjpWBU2wkwHpFDb commented 1 year ago

That's not it. Here are my settings. Thanks.

Capture_2023-09-22
eyalroz commented 1 year ago

Well, the first point of note is that the "extra empty line" issue can sneak in through the body comparison.

But regardless - please gradually remove comparison criteria, one by one, and rerun the dupe check - until either you get dupes or you've removed all criteria.

Also, please check whether the Error Console has any warning or error messages mentioning "removedupes". The console is on the menus, under Tools | Developer Tools | Error Console.

5TjpWBU2wkwHpFDb commented 1 year ago

Got it. I ran a few tests. It seems that only issue is Body. I kept Status-Flags and Number-of-lines-in-message turned OFF. If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Size (w/o Body) then I got 7334 sets of dups. If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Body (w/o Size) then I got "No duplicates found." Error Console seems innocuous. Sorry to repeat, but I double-downloaded from an IMAP account to a local folder. So, many guaranteed dups. Thanks. PS. Sorry if the screen grabs are big.

Error Console: 10:43:17.594 No chrome package registered for chrome://communicator/skin/communicator.css 10:45:31.154 Error in parsing value for ‘width’. Declaration dropped. messenger.xhtml 10:47:39.391 No chrome package registered for chrome://communicator/skin/communicator.css 10:48:37.691 Error in parsing value for ‘width’. Declaration dropped. 2 messenger.xhtml 10:52:08.109 No chrome package registered for chrome://communicator/skin/communicator.css 10:53:30.416 Error in parsing value for ‘width’. Declaration dropped. 2 messenger.xhtml

Capture_2023-09-22 Capture_2023-09-22_Review_MessageID_and_Folder1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC_Subject_Body1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC_Subject_SendTime_Size1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC_Subject1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC1
5TjpWBU2wkwHpFDb commented 1 year ago

Is there any other info./testing that I can provide? BTW, could this be related to issue 179 (Comparison of subjects with Unicode symbols fails)?

JDrewes commented 12 months ago

Hi, I have the same issue, starting with the upgrade of both thunderbird and removedupes to 115.2. Does this mean that removedupes is now doing something different, or has thunderbird started to add more of those spurious newlines?

For me, I have 2 imap accounts, which receive both separate as well as identical emails. Due to the different pathways, the headers between duplicate emails can be quite different, but apparently, the linecount can also differ by 1 or 2, as @eyalroz indicated above.

Could this be solved by adding a "ignore whitespace" option to the body comparison?

Also, as a feature idea, it would be nice to be able to select two messages for comparison to see exactly where they are considered to be the same and where they differ. This would help greatly with criteria adjustment...

Thank you for providing this very essential functionality (I mean duperemove)! @eyalroz

5TjpWBU2wkwHpFDb commented 12 months ago

Hi JDrewes, Could you do me a favor? Double-download some messages from 1 account/pathway; run remove duplicates w/o Body, don't delete, (how many dupes?); run remove duplicates with Body, (how many dupes?). Do the counts match? Thank you!

JDrewes commented 12 months ago

I downloaded 5 messages twice from the same account. When comparing without Body, I get 5 pairs of 2 duplicates. When comparing with Body, I get "No duplicates found".

5TjpWBU2wkwHpFDb commented 12 months ago

Thanks. That confirms it for me. There is either a (big?) change in TB 115 or an error in the Body comparison code. eyalroz, please, give us an update.

5TjpWBU2wkwHpFDb commented 12 months ago

I found the culprit: get MsgService in RemoveDupes.MessengerOverlay.messageBodyFromURI() fails. I added some diagnostic/status messages. I get the "Get MsgService . . ." messages in the Error console, but no "Got MsgService?" messages.

RemoveDupes.MessengerOverlay.messageBodyFromURI = function (msgURI) { console.log("RemoveDupes.MessengerOverlay.messageBodyFromURI(): main entry . . ."); console.log(RemoveDupes.MessengerOverlay.messageBodyFromURI(): msgURI = ${msgURI}); // The following lines don't work because of asynchronicity // let msgHdr = RemoveDupes.GetMsgFolderFromUri(msgURI); // let msgContent = await getRawMessage(msgHdr); let msgContent = ""; let MsgService; console.log("Get MsgService . . ."); try { MsgService = messenger.messageServiceFromURI(msgURI); } catch (ex) { return null; } console.log("Got MsgService?"); let MsgStream = Cc["@mozilla.org/network/sync-stream-listener;1"].createInstance(); let consumer = MsgStream.QueryInterface(Ci.nsIInputStream); let ScriptInput = Cc["@mozilla.org/scriptableinputstream;1"].createInstance(); let ScriptInputStream = ScriptInput.QueryInterface(Ci.nsIScriptableInputStream); ScriptInputStream.init(consumer); console.log("Try MsgService.streamMessage . . ."); try { MsgService.streamMessage(msgURI, MsgStream, msgWindow, null, false, null); } catch (ex) { return null; } console.log("Get msgContent . . ."); ScriptInputStream.available(); while (ScriptInputStream.available()) { msgContent += ScriptInputStream.read(512); }

console.log("Got msgContent");
5TjpWBU2wkwHpFDb commented 12 months ago

I have a fix: messenger.messageServiceFromURI is not a function; Try using the MailServices.messageServiceFromURI function. https://forums.mozillazine.org/viewtopic.php?p=14960035&sid=b9c05d0ec6b2e4640955fa7c7429df84#p14960035 https://forums.mozillazine.org/viewtopic.php?p=14960023&sid=b9c05d0ec6b2e4640955fa7c7429df84#p14960023 I have tested it and it works now. I will try to upload a patch ASAP. removedupes_0.5.4b5_tbird.xpi.zip

dbirchbauer commented 11 months ago

I can also confirm something isnt working correctly. I use basic gmail forwarding. I am trying to cleanup the email between the accounts, I used to look for matching message-id values, but now nothing is found (I have manually compared several messages and their ID do match). Even searching for the same Author/Subject/Send Time (using seconds) and I get nothing.

eyalroz commented 10 months ago

Hello everyone,

as you may be aware - there is a war going on here; I don't live in Gaza, I live in the Israeli-controlled part of Palestine - but here too there a bunch of government repression, and logistical trouble for a lot of people who have been either evacuated or told not to go to work etc. which volunteers in charitable organizations try to deal with, each in their own context. Plus I always have other obligations before removedupes maintenance, so - my apologies for not replying earlier.


@5TjpWBU2wkwHpFDb wrote:

It seems that only issue is Body. ... If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Size (w/o Body) then I got 7334 sets of dups. If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Body (w/o Size) then I got "No duplicates found."

Ok, so - let's make this bug page about just this specific issue, and nothing else. All commenters - if you have a similar/related problem, but not identical to this one - please open a separate issue.

@5TjpWBU2wkwHpFDb - if you move two of the duplicate-except-for-body messages into a local folder, does the problem persist? If it does, can you zip that folder and send it to me or attach it here? I would prefer messages which are as small and simple as possible.

5TjpWBU2wkwHpFDb commented 10 months ago

Eyal, I understand.