Identical messages with a one line difference

jonas1015119 commented 1 month ago

For some reason, identical messages when copied from various folders both online and offline sometimes are displayed as having a single line difference in line count, which prevents them from being recognized as duplicates. No idea why this happens, but after seeing a number of duplicates in my archives that's what I found. Would it be possible to add a suboption to the line count option to ignore a difference in line count by one? It always seems to be exactly one, I even found a triplet of duplicates that were off by one from each other. (keeping the largest one should probably be the default here9 I think line count + date in seconds make for a very simple but accurate filter setup, but these random variances mess with that. (Not sure if the compare message content option is affected by this)

eyalroz commented 1 month ago

What would you do if found a message which was off by one line, and added it to the dupe set - and then, found another message, off by 1 line from the second message? Would you add it to the set?

jonas1015119 commented 1 month ago

I forgot to reply to this, sorry. Going be the second picture, there seems to be some form of recursion for copies of copies. I guess if there's a way to implement a check across multiple messages for a reasonable number of iterations it wouldn't hurt, though that's probably not a common scenario.

eyalroz commented 1 month ago

It is indeed not a common scenario, but - we are dealing with deleting messages. I am obliged to always err on the side of caution - I cannot let even a single message be deleted accidentally due to me saying "oh, it's fine, this won't usually happen". I have to have a sound approach to implementing this, or otherwise not do it at all.

eyalroz / removedupes

Identical messages with a one line difference #235