astroidmail / astroid

A graphical threads-with-tags style, lightweight and fast, e-mail client for Notmuch
http://astroidmail.github.io
Other
613 stars 65 forks source link

Improve message duplicates handling #425

Open mrvdb opened 6 years ago

mrvdb commented 6 years ago

It sometimes happens that there are duplicate messages which only differ in, for example, the X_TUID header (mbsync adds this).

Astroid shows only 1 message, while in fact there are multiple files. This can get confusing. Other MUAs seem to choose displaying all duplicates, which may also not be the smartest way.

Some ideas :

c-alpha commented 6 years ago

there are different causes for duplicates, some legitmate and some perhaps not. It's likely they need a different solution (make a list of usecases?)

To get this started:

Off the top of my head, my first impulse would be to ask "does it have a message id header?"

If it does not, we could simply compute a hash over the subject and body (incl. attachments). If the hash is the same, it's a duplicate. Practically it would mean that the exact same content has been sent, and maybe re-sent, by one or more persons. Re. the subject line: how to handle "Re:", "Fwd:", and the likes (if at all)?

If it has a message id header, it's a duplicate if it has the same message id. If the message id is different, or one has the header while the other doesn't, we could still use the hash method to find duplicates of exact same content.

The more I think about it, the more I get the impression that this could end up as a feature request for notmuch? Or maybe it does that already (somehow)?

Once that would have happened, we could think about how to present it in the UI.

c-alpha commented 6 years ago

@gauteh, not sure why you tagged this "security"?

gauteh commented 6 years ago

It is only a duplicate when two files have the same message-id headers. It is a security issue because it an attacker could send a second email with a duplicate message-id and it may be the shown message, masking the original: e.g. a patch sent to a mailing list with some code where the message-id is publicly known.

If a message doesn't have a message-id notmuch assigns one.

It is a UI issue how this is displayed. Notmuch doesn't hide the fact that there are several files associated with one ID. I think that it also (recently) indexes the content from all associated files.