eyalroz / removedupes

Remove Duplicate Messages
https://addons.thunderbird.net/en-US/thunderbird/addon/removedupes/
Other
87 stars 6 forks source link

Missed duplicates #22

Closed RogerRiggs closed 3 years ago

RogerRiggs commented 4 years ago

This extension is very useful but occasionally misses duplicates when filtered on the body. When filtering on the message ID, the Duplicate Message Deletion window will show several messages with slightly different number of lines. Usually only off by one or two. Diffs of the message bodies does not show any differences in the body lines.

As a sample of the messages: https://mail.openjdk.java.net/pipermail/kulla-dev/2020-September/002590.html https://mail.openjdk.java.net/pipermail/compiler-dev/2020-September/015078.html https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-September/069405.html

eyalroz commented 4 years ago

If the number of lines is reported by Thunderbird as different, then I treat the messages as different. Obviously, I have to err on the side of caution.

Or are you saying that in the case of body+number-of-lines, I should "doubt" the number of lines Thunderbird reports, and compare using the body anyway?

RogerRiggs commented 4 years ago

I'm not familiar with Thunderbird's APIs so don't have any opinion about its functions or reliability. I saved the whole messages including headers and did a diff. There were no differences in the bodies of the emails, only the headers. Short cutting on the number of lines seems like a reasonable optimization to make it quicker, if the number of lines is reliable. I could forward the emails, if that would make it easier to see the case in question.

eyalroz commented 4 years ago

What I could really use is if you saved your messages in the same Thunderbird folder, then zipped that raw folder and mailed it to me. That way there will be no artifacts due to format changes.

RogerRiggs commented 4 years ago

There may be a difference in the line count between local folders and imap folders. For a local folder the message #lines are the same (20), for an imap folder, the message lines vary and are much larger (119-121). They may include the headers.

Let me know where to mail the folder to. I'm Roger.Riggs@oracle.com.

eyalroz commented 3 years ago

Did I leave this discussion hanging? Anyway, if you haven't already done so, look at my GitHub profile and use the mail address there.

RogerRiggs commented 3 years ago

The email folder I sent on 10/9 did not exhibit the problem so it was not reproducible. I speculate that the lengths are consistent in local folders, but vary when examining a message in an imap folder. I am able to remove duplicates using the messageid only and have a work around.

eyalroz commented 3 years ago

I speculate that the lengths are consistent in local folders, but vary when examining a message in an imap folder.

Oh, yes, this has happened for some users of mine. It's probably due to IMAP server bugs (although theoretically it could be a TB bug). If you can spare the time to reproduce this consistently, consider filing a bug on bugzilla.mozilla.org - even if you're not sure whether it's TB's fault or the server.