Closed hannahwhy closed 6 years ago
this would break matching on link urls, which is pretty important for some usescases
On Sun, Dec 3, 2017 at 3:30 PM David Yip notifications@github.com wrote:
At the moment, keyword mutes run on the status content as it is stored in the database, which means it's matching on both the status text and any HTML that's injected to control paragraph breaks, hashtag links, etc.
We should strip out this HTML and match only on the text.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/glitch-soc/mastodon/issues/234, or mute the thread https://github.com/notifications/unsubscribe-auth/AAORV5CUnOnfVcUNozpetHukkzwopVzmks5s8wT1gaJpZM4Qz4CB .
Is that a use case we could satisfy by matching on both the text content and the HTML? Or perhaps we don't need to operate on tag-stripped text at all?
The initial motivation for doing text-only match was to support the "keyword mute for all-caps posts", but given the questions in #235 I'm not sure we can support that (at least not without making it a special case).
the html content of statuses includes the full URL for masto instances, it's shortened via CSS. this does mean you (maybe) can't (depending on how they format things) match link urls from other (non-Masto) instances though.
(notably, if this wasn't the case then link url matching wouldn't work on the regex filtering currently implemented in the frontend either)
see https://github.com/glitch-soc/mastodon/blob/master/app/javascript/mastodon/reducers/statuses.js#L57 for how tags are currently stripped for searching and regex in the frontend
(whoops, the relevant line is the domParser where search_index
is calculated a few lines down)
Oh, right -- I think you pointed this out to me before. Maybe we'll have to replicate this in the backend for consistency's sake.
Double-checking: if client-side regex does filter on links, is it really looking at href
, src
, etc? The method used to build search_index
doesn't seem to include that information:
(I think it would be a good idea to include href
and src
, but based on this I don't think it's needed for this issue -- we can break it out to an enhancement)
it is not because a.href === a.textContent
for mastodon statuses~
At the moment, keyword mutes run on the status content as it is stored in the database, which means it's matching on both the status text and any HTML that's injected to control paragraph breaks, hashtag links, etc.
We should strip out this HTML and match only on the text.