glitch-soc / mastodon

A glitchy but lovable microblogging server
https://glitch-soc.github.io/docs/
GNU Affero General Public License v3.0
702 stars 182 forks source link

Make keyword mutes operate on the text content of statuses #234

Closed hannahwhy closed 6 years ago

hannahwhy commented 6 years ago

At the moment, keyword mutes run on the status content as it is stored in the database, which means it's matching on both the status text and any HTML that's injected to control paragraph breaks, hashtag links, etc.

We should strip out this HTML and match only on the text.

nightpool commented 6 years ago

this would break matching on link urls, which is pretty important for some usescases

On Sun, Dec 3, 2017 at 3:30 PM David Yip notifications@github.com wrote:

At the moment, keyword mutes run on the status content as it is stored in the database, which means it's matching on both the status text and any HTML that's injected to control paragraph breaks, hashtag links, etc.

We should strip out this HTML and match only on the text.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/glitch-soc/mastodon/issues/234, or mute the thread https://github.com/notifications/unsubscribe-auth/AAORV5CUnOnfVcUNozpetHukkzwopVzmks5s8wT1gaJpZM4Qz4CB .

hannahwhy commented 6 years ago

Is that a use case we could satisfy by matching on both the text content and the HTML? Or perhaps we don't need to operate on tag-stripped text at all?

The initial motivation for doing text-only match was to support the "keyword mute for all-caps posts", but given the questions in #235 I'm not sure we can support that (at least not without making it a special case).

marrus-sh commented 6 years ago

the html content of statuses includes the full URL for masto instances, it's shortened via CSS. this does mean you (maybe) can't (depending on how they format things) match link urls from other (non-Masto) instances though.

(notably, if this wasn't the case then link url matching wouldn't work on the regex filtering currently implemented in the frontend either)

marrus-sh commented 6 years ago

see https://github.com/glitch-soc/mastodon/blob/master/app/javascript/mastodon/reducers/statuses.js#L57 for how tags are currently stripped for searching and regex in the frontend

marrus-sh commented 6 years ago

(whoops, the relevant line is the domParser where search_index is calculated a few lines down)

hannahwhy commented 6 years ago

Oh, right -- I think you pointed this out to me before. Maybe we'll have to replicate this in the backend for consistency's sake.

hannahwhy commented 6 years ago

Double-checking: if client-side regex does filter on links, is it really looking at href, src, etc? The method used to build search_index doesn't seem to include that information: screenshot_20171221_165825

(I think it would be a good idea to include href and src, but based on this I don't think it's needed for this issue -- we can break it out to an enhancement)

marrus-sh commented 6 years ago

it is not because a.href === a.textContent for mastodon statuses~

hannahwhy commented 6 years ago

236 addresses this issue.