Closed RogerRiggs closed 3 years ago
If the number of lines is reported by Thunderbird as different, then I treat the messages as different. Obviously, I have to err on the side of caution.
Or are you saying that in the case of body+number-of-lines, I should "doubt" the number of lines Thunderbird reports, and compare using the body anyway?
I'm not familiar with Thunderbird's APIs so don't have any opinion about its functions or reliability. I saved the whole messages including headers and did a diff. There were no differences in the bodies of the emails, only the headers. Short cutting on the number of lines seems like a reasonable optimization to make it quicker, if the number of lines is reliable. I could forward the emails, if that would make it easier to see the case in question.
What I could really use is if you saved your messages in the same Thunderbird folder, then zipped that raw folder and mailed it to me. That way there will be no artifacts due to format changes.
There may be a difference in the line count between local folders and imap folders. For a local folder the message #lines are the same (20), for an imap folder, the message lines vary and are much larger (119-121). They may include the headers.
Let me know where to mail the folder to. I'm Roger.Riggs@oracle.com.
Did I leave this discussion hanging? Anyway, if you haven't already done so, look at my GitHub profile and use the mail address there.
The email folder I sent on 10/9 did not exhibit the problem so it was not reproducible. I speculate that the lengths are consistent in local folders, but vary when examining a message in an imap folder. I am able to remove duplicates using the messageid only and have a work around.
I speculate that the lengths are consistent in local folders, but vary when examining a message in an imap folder.
Oh, yes, this has happened for some users of mine. It's probably due to IMAP server bugs (although theoretically it could be a TB bug). If you can spare the time to reproduce this consistently, consider filing a bug on bugzilla.mozilla.org - even if you're not sure whether it's TB's fault or the server.
This extension is very useful but occasionally misses duplicates when filtered on the body. When filtering on the message ID, the Duplicate Message Deletion window will show several messages with slightly different number of lines. Usually only off by one or two. Diffs of the message bodies does not show any differences in the body lines.
As a sample of the messages: https://mail.openjdk.java.net/pipermail/kulla-dev/2020-September/002590.html https://mail.openjdk.java.net/pipermail/compiler-dev/2020-September/015078.html https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-September/069405.html