alphapapa / ement.el

A Matrix client for GNU Emacs
GNU General Public License v3.0
488 stars 44 forks source link

Constrain the ement-room--event-mentions-user-p regexp #240

Open phil-s opened 10 months ago

phil-s commented 10 months ago

At present the word uphill in a message is treated as mentioning the username phil :)

Let's at minimum wrap the pattern with word boundaries?

(Or do we then worry about usernames which do not begin/end with word characters? Either way, I'm sure we can improve this...)

alphapapa commented 10 months ago

Makes sense. Thanks.

phil-s commented 10 months ago

I looked to see if there were restrictions on characters used for a user's display name, and I'm not seeing anything. Searching for displayname at https://spec.matrix.org/v1.8/client-server-api/ gives lots of info, but it seems like the value may be "any string".

So we can't simply wrap the quoted display name with word boundaries, but I suspect that even if the user had non-word characters at the start or end of their display name, we could assume that any mention of their name will have non-word characters (or bol/eol) surrounding the display name. So maybe something like this:

"\\(?:^\\|[^[:word:]]\\)\\(DISPLAYNAME\\)\\(?:$\\|[^[:word:]]\\)"

Possibly in the form of a rx-based template in a new variable or user option.

alphapapa commented 10 months ago

It's kind of hidden here: https://spec.matrix.org/v1.8/appendices/#user-identifiers

phil-s commented 10 months ago

Ah, yes, we have different rules for username and displayname (I was only thinking about displayname).

At present ement-room--event-mentions-user-p uses the same code for both. In fact it matches on three names...

    (or (matches-body-p (ement-user-username user))
        (matches-body-p (ement--user-displayname-in room user))
        (matches-body-p (ement-user-id user)))))))

Where username is the "Username part of user's Matrix ID." in the user struct; so I guess we apply the character constraints from your spec URL to both ement-user-username and ement-user-id, and only ement--user-displayname-in should be the more permissive one.

Konubinix commented 10 months ago

IIUC, doing this would fix the issue for both username and useid, right?

    (or (matches-body-p (format "\\b%s\\b" (ement-user-username user)))
        (matches-body-p (ement--user-displayname-in room user))
        (matches-body-p (format "\\b%s\\b" (ement-user-id user))))))))

Can we do this first and deal with the user displayname after?

Konubinix commented 10 months ago

About the displayname, it sounds like there is no silver bullet.

For example, I think the displayname ":-) me :-)" can be considered a mention in "Hello :-) me :-)" but also in "Hello:-) me :-)".

But the displayname ". Therefore" would be matched against "Bla bla. Therefore bla bla", which seems counterintuitive.

I guess that to deal with all cases, we should provide a custom fonction that the user may define to match per own displayname, and provide one by default. My opinion would be to provide one that keep the current behavior.