GeopJr / Tuba

Browse the Fediverse
https://tuba.geopjr.dev/
GNU General Public License v3.0
557 stars 60 forks source link

[Bug]: Apostrophes are considered as word separators in "whole-word" filters, causing false-positives #875

Closed nekohayo closed 6 months ago

nekohayo commented 6 months ago

Describe the bug

In French, we have a lot of compound words that use an apostrophe (like in English, it's == it is).

The problem is, those seem to be considered as word delimiters by Tuba's filters.

Steps To Reproduce

Set up some filters like these:

image

Notice the filter on the whole-word string AI

Then write some toots that contain that string as part of a compound word with an apostrophe (typographical or not), such as `Ah, cette douleur que j'ai, que j'ai !" or "Je n'ai rien fait de mal".

Result: those toots get filtered.

I suppose there might be other cases like this (typographic apostrophes? other punctuation marks? special characters?), but I haven't tested them...

Logs and/or Screenshots

No response

Instance Backend

Mastodon

Operating System

Fedora 39

Package

Flatpak

Troubleshooting information

No response

Additional Context

No response

GeopJr commented 6 months ago

Okay so, I followed your instructions, added AI to whole word filters and sent myself a toot from another account with (API response): content: '<p>test <span class="h-card" translate="no"><a href="https://mastodon.social/@agenthsudabbuwiadn" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>agenthsudabbuwiadn</span></a></span> Ah, cette douleur que j'ai, que j'ai !</p>'

j'ai is in there

It got filtered with (API response):

{
    "filtered": [
        {
            "filter": {
                "id": "60558",
                "title": "a",
                "context": [
                    "home",
                    "notifications",
                    "public",
                    "thread",
                    "account"
                ],
                "expires_at": null,
                "filter_action": "warn"
            },
            "keyword_matches": [
                "ai"
            ],
            "status_matches": null
        }
    ]
}

So Tuba does it right, but the problem is on Mastodon's filter matching, I'll see on their issue tracker if it has already been reported

GeopJr commented 6 months ago

This looks similar: https://github.com/mastodon/mastodon/issues/8405 (whole word filter applies to urls, between / and .)

I'll close this as there's nothing I can do, Mastodon tells Tuba it should be filtered and it filters it

:/

GeopJr commented 6 months ago

I kind-of get why this is happening and dk how they are going to solve it, but it's up to them.

If whole word ignored ', ", ., ,,... then everything would bypass it.

For example: The CEO said: 'we replaced everyone with AI' This was made with AI. would both bypass the filter if it wasn't handled the way it is currently handled