LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.14k stars 869 forks source link

Make the slur filter editable from the site itself #622

Closed StaticallyTypedRice closed 3 years ago

StaticallyTypedRice commented 4 years ago

It's generally not a good idea to hard code something like the slur filter because the needs of every instance is different. Instances in another language would need their versions, and cases where the slur filter over blocks need to be addressed by the admins.

A good idea would be to store the slur filter in the database and initialize it with a default when spinning up an instance, but make it editable by admins without changing source files.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/91530086-make-the-slur-filter-editable-from-the-site-itself?utm_campaign=plugin&utm_content=tracker%2F126011972&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F126011972&utm_medium=issues&utm_source=github).
dessalines commented 4 years ago

I'll have to think about this. Hard-coding it means I don't have to do a database migration every time someone comes up with a new slur. And putting it in a DB table means someone could very easily remove it by deleting every row of that table, which isn't good. I want to make it very difficult for racist trolls to use the most updated version of Lemmy.

As far as adding / removing slurs, I'd rather these be done by community consensus and basic standards of decency. So far we haven't gotten any requests to add or remove specific slurs, so a debate / process hasn't emerged, but I imagine as lemmy grows, this will happen naturally too.

Multi-language slur filter would require things from #440 #391 , mainly to analyze the incoming language, and then apply the correct language slur filter. This is so far in the future its not worth thinking about right now.

dessalines commented 4 years ago

I want to make it very difficult for racist trolls to use the most updated version of Lemmy.

Is this not clear enough? Slurs are against our code of conduct and the goals of this project. Go to voat or gab if you want to use racist or sexist slurs; we don't allow them here.

MasterOfTheTiger commented 4 years ago

Not that I want to use slurs or that anyone else should, just that you have a hard-coded words restrictions. But that sounds exactly like a thing that should be entirely configurable by the admin, as they are the ones in charge of moderation on the instance.

Feelsy commented 4 years ago

Give users of Lemmy (site owners) the option to completely disable it. It can be done by removing it from the codebase but an option would be cleaner.

They have a point, if this is supposed to be an open source alternative to Reddit in which other users can use, wouldn't the site creator or creators be able to edit all aspects of the software?

Nutomic commented 4 years ago

If you dont like it, fork it. Stop bothering us about it, we will never fully remove the slur filter.

dessalines commented 4 years ago

Racism, sexism, transphobia et al are against the code of conduct and goals of this project. Re-arranging how slurs are filtered, where they're stored, or what should be included is up for discussion, but the existence of a slur filter is not.

Again, your "uncensored" reddit experience already exists: go to voat or gab if you want to use slurs.

notfood commented 4 years ago

Having the ability to edit the words without having to recompile is a legitimate issue. It's already very easy to remove that regex, it's not going to stop anyone. Maintenance of the word list isn't so easy specially when it can grow any moment and be in any language.

Please, don't turn this into an us vs them issue, it's irrelevant. Don't sacrifice flexibility just to make a point.

dessalines commented 4 years ago

Having the ability to edit the words without having to recompile is a legitimate issue.

Whether updating the slurs comes from a DB migration, or another line in a source code file, still requires an update of the software, but doesn't require anyone running it to recompile on their own. In 99% of cases, it'll just be changing a line in docker-compose.yml from v0.0.1 to v0.0.2. Even translation updates require this to pick up new translations.

Unless you're talking about having all instances rely on a slurs file hosted somewhere, I don't see how it could get any more "flexible".

Again, it should not be easy for an instance to remove the slur filter, and use the most updated version of Lemmy.

slapula commented 4 years ago

Would it make sense to have an "additional slurs" field to add to the hard-coded list? I agree with having a hard-coded list but I can see there being instances that may want to add additional words as needed.

dessalines commented 4 years ago

Ya I could definitely see that being useful.

StaticallyTypedRice commented 4 years ago

And putting it in a DB table means someone could very easily remove it by deleting every row of that table, which isn't good. I want to make it very difficult for racist trolls to use the most updated version of Lemmy.

Unfortunately, this is extremely easy to bypass to the point where I don't think the protection of a hard coded or otherwise difficult to edit slur filter is nearly worth the disadvantages to good instance maintainers. Someone could easily fork the project with the slur filter removed and just pull new code while always ignoring that file or method. Anyone with even a basic understanding of programming will be able to locate and disable the slur filter. It's also extremely easy to bypass by using alternate UTF-8 characters such as stylized letters or letter based mathematical symbols.

However, the disadvantages are many with a hard coded slur filter. The biggest one is that it massively over-blocks as it has no concept of context. Words in other languages sometimes get misidentified (which will be a problem for non-English instances), as do parts of benign English words (smartwatch gets blocked because of the word tw*t). It'll be extremely awkward when someone's own name gets blocked, and it will make them feel unwelcome. Each of these cases would require a refinement to the filter and in the hard coded model would involve editing the source files and recompiling the entire backend, which at best is annoying for the instance maintainer (especially if it's volatile and an update will reset it), at worse is the instance maintainer doesn't know how to properly change the slur filter (regular expressions are difficult to write and even more difficult to write well) and can break it, and at worst if they really don't know what they're doing, they can break the entire backend or create a security vulnerability since they would need to edit one of the Rust source files.

Another example I can think of would be a niche instance dedicated to literature where people analysing older texts may post exact quotes which may contain words that by today's standards are offensive. I believe that in general, context matters when it comes supposedly "offensive" words.

ptman commented 4 years ago

localization makes static lists hard

poperigby commented 4 years ago

I agree that having a hard coded list is a bad idea. Context is a huge part of language, and a hardcoded filter completely ignores it. See the Scunthorpe problem as one example. This thread on Tildes also brings up a lot of good points. I'm all for blocking Nazis, but there's got to be a better way.

seniorm0ment commented 4 years ago

Racism, sexism, transphobia et al are against the code of conduct and goals of this project. Re-arranging how slurs are filtered, where they're stored, or what should be included is up for discussion, but the existence of a slur filter is not.

Again, your "uncensored" reddit experience already exists: go to voat or gab if you want to use slurs.

I hope this could be reconsidered and looked at more maturely and from both views and not just "word bad and offends I don't like, so we block."

Filtering words goes far beyond this. But first, I assumed, it is only if the instance you're on applies the filter to their communities, now in it's current state I assumed if you're on an instance that applies it to their communities, it would still be applied to other instances that host those communities and choose it on. But I guess this is not the case, as you mention here it is hardcoded, which you made it seem in the past but I thought you got a little confused. This is absolutely terrible, please read my latter to hear my point.

In a theoretical state, where it would not be hardcoded into Lemmy, but set per instance:

I quite honestly still fail to see why this would be an instance controlled thing, rather than a user choice. If an instance wants it on, that's understandable. But if users want to turn it off client side, and uncensor everything, I fail to see why this would be an issue. It gives users the freedom and assurance to know that nothing is being censored if they wish, not everyone gets upset over the same things, not everyone finds everything offensive, why not let them just uncensor everything locally? Everything would still be filtered for people who keep the filter enabled obviously. This is also an issue when it comes to politics, and everything else. Who's to say an instance who has a major political bias, cannot just start censoring a politician's name, or what not. I think the language filter should be able to be customized by the instance host which I assume it is, and then also allowed to be turned off client side in which that toggle would affect EVERY instance that implements it so they can cross communities on different instances and know what they are seeing is exactly what the user on the other end typed. And for users that don't want that? Well they can choose to keep the filter on. I know this has been a controversial subject, not just with me but with many others. I think this would be the best way about doing this, you give users the freedom to see exactly what they want, while still allowing the instances to filter words. It's just up to the user whether or not they want to see it filtered or unfiltered. This is extremely important.

For all we know, if you have a major political bias towards Trump for example, you could just throw the word "Trump" onto the hard-coded filter list, and that's it. This would piss off a lot of users. Filter lists are absolutely terrible for censorship. We already have the ability to (soon) federate with other instances, instead of hard-coding a filter list, just BLOCK THE INSTANCE. Allow instances to set language filters personally, do NOTHING on Lemmy's side. You're trying to act as a god that judges what is said over all instances, if this is the case why are you even bothering with the Fediverse? You and nobody else should have any say as to what is posted, or said on any other instances. This should be up to the instance HOST. Otherwise, how is Lemmy supposed to be anything different than Twitter? Just because you can host your own instance? What is the point of doing that, if you can control everything that is said?

This has absolutely NOTHING to do with "wanting to be racist, sexist, etc." It goes a lot further than that, it is the CORE concept of not giving control to a certain small group of people. There is nothing you can do to stop someone from making an instance hosting extremely controversial material, the language filter means nothing to them as they can just use different words and you'd be playing cat and mouse with those words just to get them out. Ontop of doing this, you may be adding thousands of meaningless words to your filter, which would also harm users on all other instances. Instead of just choosing to let them keep their instance up, and NOT FEDERATE WITH THEM. Nobody on YOUR instance will have anything to do with them, they won't see the posts or communities or anything. I don't understand why it cannot be left to this. I have mentioned this in the past, and you just deleted my post and called me racist and sexist and said to go back to Gab. I had made it extremely clear, that my post was not standing up to racism, sexism, or anything. I was talking purely theoretical, and as I mentioned in this post, it goes far beyond being racist. You can be the nicest person ever, hate racism and sexism and whatever and completely be against it, but also be against the idea that people get censored and can't say what they want to say.

Ontop of this, there are more languages than just English, you'd have to filter everything you consider bad in every language, as well. Again also words are used differently in different regions. And I'm sure the language filter wouldn't be perfect. You're going to play cat and mouse with the filter, even if compiled by the comminity, for absolutely no real benefit and massive loss in flexibility, when you could just leave it to the instances.

I like Lemmy, I want to see it grow, this is why I am speaking out about this issue. The Fediverse should give the control to the users. That is all it is about. Let the users decide. From the amount of posts being made about the language filter, it is clear I am not the only one who has concerns about this. Please be more open minded. It is not fair to the users to hardcode a language filter into Lemmy that applies to all instances, it does nothing but hurt the users.

Let instances choose to federate with other instances, and apply language filters to those instances, and even better would be to also allow users to turn off the language filter locally across all instances. This would be in everyones favor. I am making a middle grown proposal, that favors both the users and you and the instances. Lemmy is a platform, please don't turn it into something it's not. Pleroma is a platform. PeerTube is a platform, etc. The instances have the true power, they have the power over their own users, not other instances users. If they do not like another instance, they can simply choose not to federate with that instance. If they want to apply a language filter.

Another example I can think of would be a niche instance dedicated to literature where people analysing older texts may post exact quotes which may contain words that by today's standards are offensive. I believe that in general, context matters when it comes supposedly "offensive" words.

Also I completely agree with this, this is something we do today, long term this is not a great idea. Words change, meanings change. I actually had a few more statements, one of which relates to this and questions the possibility of implementation of a blockchain like system/P2P/etc. for long-term archival, which I typed up but decided to post the part about the language filter here and go more in depth. You can read my comments in #647

Please, please, look at this from our point of view and understand how this looks bad. We WANT Lemmy to grow, and see it do well. This is why we speak out about these concerns. This is why we use the Fediverse, because we care. If we didn't care we wouldn't be sitting here complaining about it typing long paragraphs of our concerns trying to help you understand. It's not because we are racist and sexist or whatever. Please just stop throwing that on us, it is frustrating. Please consider being more open minded about this.

theAeon commented 4 years ago

You're trying to act as a god that judges what is said over all instances, if this is the case why are you even bothering with the Fediverse? You and nobody else should have any say as to what is posted, or said on any other instances. This should be up to the instance HOST. Otherwise, how is Lemmy supposed to be anything different than Twitter? Just because you can host your own instance? What is the point of doing that, if you can control everything that is said?

This has absolutely NOTHING to do with "wanting to be racist, sexist, etc." It goes a lot further than that, it is the CORE concept of not giving control to a certain small group of people. There is nothing you can do to stop someone from making an instance hosting extremely controversial material, the language filter means nothing to them as they can just use different words and you'd be playing cat and mouse with those words just to get them out. Ontop of doing this, you may be adding thousands of meaningless words to your filter, which would also harm users on all other instances. Instead of just choosing to let them keep their instance up, and NOT FEDERATE WITH THEM. Nobody on YOUR instance will have anything to do with them, they won't see the posts or communities or anything. I don't understand why it cannot be left to this. I have mentioned this in the past, and you just deleted my post and called me racist and sexist and said to go back to Gab. I had made it extremely clear, that my post was not standing up to racism, sexism, or anything. I was talking purely theoretical, and as I mentioned in this post, it goes far beyond being racist. You can be the nicest person ever, hate racism and sexism and whatever and completely be against it, but also be against the idea that people get censored and can't say what they want to say.

This would be in everyones favor. I am making a middle grown proposal, that favors both the users and you and the instances. Lemmy is a platform, please don't turn it into something it's not. Pleroma is a platform. PeerTube is a platform, etc. The instances have the true power, they have the power over their own users, not other instances users. If they do not like another instance, they can simply choose not to federate with that instance. If they want to apply a language filter.

Honestly, this is the only relevant point. Hardcoding anything is antithetical to the point of federation. Mastodon doesn't have a word filter. It doesn't need one. Instances handle the issue of bigots by blocking them from federation. There is no reason that Lemmy would be any different.

Regardless, the word filter isn't even going to stop anyone and achieve your stated goal. Forking and automating it out on code updates is trivial. I don't quite get what you're trying to achieve. The chapo.chat fork has already stated their intention to make this editable on runtime.

scott-hand commented 4 years ago

Can you at least use bare-minimum techniques like stemming and other normalization (letter normalization would be needed to handle things like "slur" being bypassed by "slĂŒr" for example)?

For example, there was someone who tried to talk about Stardew Valley and only managed to talk about Sremovedew Valley.

At least focus on a competent implementation before accusing people asking for some flexibility as being voat concern trolls.

Its existing hard coded list is also centered on American/European English speaking countries, with English slurs in countries like South Africa being completely ignored. Is this "federated" software only intended for those predominantly white populations?

jcfrancisco commented 4 years ago

There are a lot of valid concerns about improving the implementation and allowing extension of the filter (for non-English speaking communities especially).

But I want to make the case for keeping it hardcoded, and push back on the notion that "anyone will be able to fork it anyway."

Yes, it's true, anyone will ultimately be able to get around it, but I think the goal is to add friction to the process. And I think that's a worthwhile goal that will materially impact hateful speech online.

The Stratechery newsletter -- which I very often disagree with -- makes this case very well. From a newsletter that was talking about the NSA revelations in 2013:

David Simon, of The Wire fame, wasn’t that impressed with the Verizon revelations:

Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.

Allow for a comparable example, dating to the early 1980s in a place called Baltimore, Maryland.

The example involves pay phones and pagers, and the collection of metadata surrounding calls, but not the calls themselves. To requote Simon:

The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data.

Let’s say Simon is right, and was universally acknowledged as such; I bet the outrage would persist. The problem is the lack of friction.

In Baltimore, those detectives had to identify the relevant pay phones, install a dialed-number recorder on each pay phone, clone the pagers, and even then they often didn’t know who the drug dealers were.

Things are much easier today; global communications is largely routed through a few key backbone and service providers, many of which are located in the US. It’s arguably easier to collect the call records of everyone on the planet – and identify them – than it was to collect the records and identities of those Baltimore drug dealers.

One could argue that friction was the foundation of our privacy, and now friction is gone.

And here's a snippet from another, more recent newsletter, in this case pondering the tradeoffs of end-to-end encryption on platforms like Facebook where such encryption might hinder law enforcement:

The fact of the matter, as I noted above, is that encryption is a real things that exists, and it is not going anywhere. Evil folks will always be able to figure out the most efficient way to be evil. The question, though, is how much friction do we want to introduce into the process? Do we want to make it the default that the most user-friendly way to discover your “community”, particularly if that community entails the sexual abuse of children, is by default encrypted? Or is it better that at least some modicum of effort — and thus some chance that perpetrators will either screw up or give up — be necessary?

To take this full circle, I find those 12 million Facebook reports to be something worth celebrating, and preserving. But, if Zuckerberg follows through with his “Privacy-Focused Vision for Social Networking”, the opposite will occur. I do remain a fierce defender of encryption, and opponent of backdoors, but at the same time, we do as a society at some point have to grapple with the downside of the removal of Friction.

"Evil folks will always be able to figure out the most efficient way to be evil." I wholeheartedly agree with this, and I think that's why I'm not too convinced by arguments that take the form "people will get around it anyway."

In this light, maybe what we should think about are the things we want to have the most friction, and the things we want to have the least.

There should be minimal friction in:

There should be maximum friction in:

To me, this means keeping the filter somewhat centralized and opinionated, yet at the same time frequently updated. It sounds like the repo's maintainers want to take on the responsibility of doing that, and that this work aligns with the goals of the project.

theAeon commented 4 years ago

When the "maximum friction" can be subverted with trivial automated scripts and uploaded for anyone to use, I question the value of maintaining a clunky static filter.

-------- Original message --------From: jcfrancisco notifications@github.com Date: 7/27/20 2:20 PM (GMT-05:00) To: LemmyNet/lemmy lemmy@noreply.github.com Cc: Andrew Donshik andrewdonshik@gmail.com, Comment comment@noreply.github.com Subject: Re: [LemmyNet/lemmy] Make the slur filter editable from the site itself (#622)

There are a lot of valid concerns about improving the implementation and allowing extension of the filter (for non-English speaking communities especially).

But I want to make the case for keeping it hardcoded, and push back on the notion that "anyone will be able to fork it anyway."

Yes, it's true, anyone will ultimately be able to get around it, but I think the goal is to add friction to the process. And I think that's a worthwhile goal that will materially impact hateful speech online.

The Stratechery newsletter -- which I very often disagree with -- makes this case very well. From a newsletter that was talking about the NSA revelations in 2013:

David Simon, of The Wire fame, wasn’t that impressed with the Verizon revelations:

Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.

Allow for a comparable example, dating to the early 1980s in a place called Baltimore, Maryland.

The example involves pay phones and pagers, and the collection of metadata surrounding calls, but not the calls themselves. To requote Simon:

The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data.

Let’s say Simon is right, and was universally acknowledged as such; I bet the outrage would persist. The problem is the lack of friction.

In Baltimore, those detectives had to identify the relevant pay phones, install a dialed-number recorder on each pay phone, clone the pagers, and even then they often didn’t know who the drug dealers were.

Things are much easier today; global communications is largely routed through a few key backbone and service providers, many of which are located in the US. It’s arguably easier to collect the call records of everyone on the planet – and identify them – than it was to collect the records and identities of those Baltimore drug dealers.

One could argue that friction was the foundation of our privacy, and now friction is gone.

And here's a snippet from another, more recent newsletter, in this case pondering the tradeoffs of end-to-end encryption on platforms like Facebook where such encryption might hinder law enforcement:

The fact of the matter, as I noted above, is that encryption is a real things that exists, and it is not going anywhere. Evil folks will always be able to figure out the most efficient way to be evil. The question, though, is how much friction do we want to introduce into the process? Do we want to make it the default that the most user-friendly way to discover your “community”, particularly if that community entails the sexual abuse of children, is by default encrypted? Or is it better that at least some modicum of effort — and thus some chance that perpetrators will either screw up or give up — be necessary?

To take this full circle, I find those 12 million Facebook reports to be something worth celebrating, and preserving. But, if Zuckerberg follows through with his “Privacy-Focused Vision for Social Networking”, the opposite will occur. I do remain a fierce defender of encryption, and opponent of backdoors, but at the same time, we do as a society at some point have to grapple with the downside of the removal of Friction.

"Evil folks will always be able to figure out the most efficient way to be evil." I wholeheartedly agree with this, and I think that's why I'm not too convinced by arguments that take the form "people will get around it anyway."

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

chapotracker commented 4 years ago

yeah i'm pretty sure your wow-so-hard-to-bypass filter is like one line of regex lmaooo

jcfrancisco commented 4 years ago

^ Maybe so for experienced users, but users who aren't as computer savvy still have to do some work to figure it out.

If it's that easy, then maybe the goal should be to make it harder.

theAeon commented 4 years ago

Users who aren't as computer savvy aren't hosting instances.

How, out of curiosity, do you suggest making it harder in an open source project?

chapotracker commented 4 years ago

if you think the venn diagram of "people who can set up this federated social network service and deploy it for a userbase" and "people who can grep the codebase for EXAMPLE_SLUR and edit it out" isn't a circle then i don't know what to tell you lmao

jcfrancisco commented 4 years ago

I stand corrected. I thought you were suggesting some kind of client-side workaround and your point is well-taken!

How, out of curiosity, do you suggest making it harder in an open source project.

I don't know. Got any ideas?

jcfrancisco commented 4 years ago

if you think the venn diagram of "people who can set up this federated social network service and deploy it for a userbase" and "people who can grep the codebase for EXAMPLE_SLUR and edit it out" isn't a circle then i don't know what to tell you lmao

I think this is besides the point though, no? Of course they'll do it if they want to. But the friction in this case would be - I'm shopping around for something to deploy for my forum, I check out Lemmy, I find that it has some things I don't want for my community like a slur filter, so I decide to deploy something else.

If the goal is for as many people to use Lemmy as possible then a hardcoded slur filter makes no sense - you want to support as many use cases you can and bend over backwards for people. But I'm not sure that's the goal of this project (the maintainers can feel free to chime in if I'm wrong)

theAeon commented 4 years ago

So instead of using Lemmy, the hypothetical server host would use Forked-and-Auto-Patched-Not-Lemmy? Again, it just seems like a royal waste of effort.

jcfrancisco commented 4 years ago

So instead of using Lemmy, the hypothetical server host would use Forked-and-Auto-Patched-Not-Lemmy? Again, it just seems like a royal waste of effort.

Exactly right -- compelling people to make the extra effort is the point!

theAeon commented 4 years ago

So instead of using Lemmy, the hypothetical server host would use Forked-and-Auto-Patched-Not-Lemmy? Again, it just seems like a royal waste of effort.

Exactly right -- compelling people have to make the extra effort is the point!

Uhm, I meant extra effort on the part of the maintainers of Lemmy.

jcfrancisco commented 4 years ago

Gotcha. Well, it's on them to decide if it's worth it. Reading between the lines of their comments here, my sense is they find intrinsic value just in having a strong stance on these words in the main branch, regardless of whether it can be circumvented or not. I think that's valuable in and of itself. But I won't speak for them -- I think I've made my point & don't want to take up all the air here

theAeon commented 4 years ago

Gotcha. Well, it's on them to decide if it's worth it. Reading between the lines of their comments here, my sense is they find intrinsic value just in having a strong stance on these words in the master branch, regardless of whether it can be circumvented or not. I think that's valuable in and of itself. But I won't speak for them -- I think I've made my point & don't want to take up all the air here

I suppose you're right. It is, in fact, their prerogative to make decisions that screw themselves over. No reason to keep bickering over it.

Nutomic commented 4 years ago

If you want to report any issues with chapo.chat (like this about the word "bastard" being filtered), please report it on their issue tracker as the instance is heavily modified.

https://gitlab.com/chapo-sandbox/production

theAeon commented 4 years ago

Did you even read the reason I mentioned them?

-------- Original message --------From: Felix Ableitner notifications@github.com Date: 7/27/20 6:30 PM (GMT-05:00) To: LemmyNet/lemmy lemmy@noreply.github.com Cc: Andrew Donshik andrewdonshik@gmail.com, Comment comment@noreply.github.com Subject: Re: [LemmyNet/lemmy] Make the slur filter editable from the site itself (#622) If you want to report any issues with chapo.chat (like this about the word "bastard" being filtered), please report it on their issue tracker as the instance is heavily modified. https://gitlab.com/chapo-sandbox/production

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

ptman commented 4 years ago

Here's simple example of the problems with languages and forbidden words: https://languagelog.ldc.upenn.edu/nll/?p=48302

ghost commented 3 years ago

Danish lemmy user is concerned about slur filter because a slur means something else in Danish https://lemmy.ml/post/45965

Huy-Ngo commented 3 years ago

As context can be a significant factor in whether a word is offensive or not, would some kind of sentiment analysis (either used in tandem or as a replacement) be a good idea?

dessalines commented 3 years ago

No IMO, not only would something like that be impossible to do because of the complexities of language, but it would also be counter-productive: in public spaces there's no appropriate context in which bigoted slurs should be used.

Huy-Ngo commented 3 years ago

in public spaces there's no appropriate context in which bigoted slurs should be used

That's true, but the context can deduce that the matched string is actually a part of a normal word or name, or is a word in some other languages. (Sentiment analysis doesn't have with that tho so I take back what I said above.)

ohir commented 3 years ago

"Lemmy is an exemplary of a white anglosaxon priviledge embossed within their source code to shame and denigrate all non-white non-english people under guisse of slur-fighting. They in their insensivity chosen to compile in and prevent only some and only English slurs (see SLUR_REGEX in the Lemmy sources). They consciously allow 'negro', 'dziwka', 'pug', 'mzungu' among thousands and thousands of other slurs" [SJW Chronicles]

We should remedy this ASAP. First we need to add full list of English slurs (some 4000 alternatives) Then a bit longer list to detect slur phrases in East European languages. Next will be circa 13800 for Africa, then some 8000 for the Far East. Its rust so it will be fast.

More demanding but something that MUST be done is to support India region, with its >1.5bn population. More demanding because there are many scripts in use and it is very much possible that someone will put a slur written in Bengali within mostly Devangari post, or use unusual but valid Devangari join to the effect of slur leaking through the filter.

_You can have a primer about contemporary Indian scripts in use here. Short introduction to most popular Devanagari script is here._


Excuse me reductio ad absurdum and fictious citation, but after reading this thread I see no other way to show infeasibility of compiled-in slur filter. Give instance admin an easy way to add such a filter and let respective language communities to set moderation/censorship rules for their own instances.

@dessalines wrote:

I want to make it very difficult for racist trolls to use the most updated version of Lemmy.

You can not do this in an open source software. Unless you are about to break build process to the efect that fork will not compile at all. You can close sources now as well. Even with closed source all it takes to remove your filter from the linux binary is a hex editor (to return early from a distinctive place of regexp entry). If someone is enough skilled to run a server instance she will be enough skilled to read HOWTO then use hexedit.

Also: no federation with mastodon/pleroma based servers? Lemmy-network only? Will it end with Lemmy silently "cleaning" timelines from anything containing "slur" as some clique of white english-speaking folks decided in the software sources? Will it extend to private and direct communication? If any of above will be ticked, the rest of Fediverse will simply delink Lemmy instances.

Note that some words regarded as slur in English are just normal words in other languages. Or even in English itself. Eg. in current regexp there is a "bitch" word - so dog breeder's posts will to the unaware recipients look like a removed slur symphony. Was this intended?

P.S. Will Lemmy be equipped with image processing and OCR to find slurs within pics and put black rectangles over them?

Hope this helps,

w3bb commented 3 years ago

@dessalines

In addition to the excellent points people have brought up, I would like to add my two cents as someone who has experience with moderating places that tend to attract bigoted people. A simple filter is not going to stop people. Bigots are the exact people who have a plan for every filter, every automated measure. As far as filters go there is a lot more that would need to be done to even scratch the surface (think about all the symbolism out there). With this in mind all it basically does is destroy conversations about these words, stop an edgelord once in a while, and really give a false sense of security. The only way to fight bad faith people is to have humans view a post.

Additionally there are words in there I use as terms of endearment for people in my own community, and other people use as terms in theirs (and plenty of people don't, which is fine as well!). At the end of the day there is a time and a place for everything, and communties need to be able to govern themselves how they see fit. If a community wants a word to be a term of endearment let them go wild! If I see an instance that's spamming a slur and after investigations it's all bad people, I block them! This is the most effective way to make everyone happy and stop bad people.

I'm not super familiar with Lemmy, as I was considering deploying it but decided against it after hearing about this, so these features may or may not be implemented or possible. People using terms of endearment, or having respectful conversations involving a word (quoting for example) getting blocked by default by 99% of all instances is more of a problem than the 0.00000001% of bigots (hyperbole) who use those words overtly. A good compromise here is maybe an autoreport going out if a slur is detected. I can investigate an instance or user, and suspend as needed. Filters will never understand context, with a filter it is essential you err on the side of caution. I don't know if there's a report feature which you can use, but this is exactly what you can do with Pleroma's MRF policies. Perhaps something like that would be good for Lemmy. Or perhaps just a simple option to autoreport posts with slurs for review.

Slurs are against our code of conduct and the goals of this project.

I took a look in there and in the contexts I brought up doesn't show anything regardling it. The closest was this:

Remarks that violate the Lemmy standards of conduct, including hateful, hurtful, oppressive, or exclusionary remarks, are not allowed. (Cursing is allowed, but never targeting another user, and never in a hateful manner.)

Remarks that moderators find inappropriate, whether listed in the code of conduct or not, are also not allowed.

I assume what you mean is you want this CoC to apply to all places, so where is the problem? Moderators of instances that adapted the CoC verbatim can absolutely follow this code of conduct with those terms and allow limited usage of those terms. If you mean this CoC applying per the Lemmy moderators' enforcement, then are NSFW instances allowed? It's a weak excuse to make when there are other okay things that Lemmy instances have (with features built around) that would be enforced completely differently here.

In terms of it being the goal to stop bigotry, some alternatives (perhaps some of what I suggested) should be considered, because as of now the filter basically does nothing, and is more likely to silence people actually dealing with bigotry.

ptman commented 3 years ago

Filters will never understand context, with a filter it is essential you err on the side of caution.

There's a reason why email spam filtering evolved from simple word-based blocking, to scoring and further to trained bayesian filtering etc.

Nutomic commented 3 years ago

@w3bb We have made our policy clear on this topic, and we are not going to change it. So there is no point arguing about it, especially not in unrelated issues. If you dont like it, you can fork Lemmy or simply not use it.

theAeon commented 3 years ago

@w3bb We have made our policy clear on this topic, and we are not going to change it. So there is no point arguing about it, especially not in unrelated issues. If you dont like it, you can fork Lemmy or simply not use it.

I would suggest you close this issue then. Because marking clearly on-topic good faith posts like @w3bb's off-topic is honestly downright disrespectful.

edit: lol

w3bb commented 3 years ago

@w3bb We have made our policy clear on this topic, and we are not going to change it. So there is no point arguing about it, especially not in unrelated issues. If you dont like it, you can fork Lemmy or simply not use it.

Ah I figured it would have been locked then. You've responded to similar posts so I figured this was on-topic. @Nutomic

glubsy commented 3 years ago

Wanted to post a link to a website on a Lemmy instance, but that got blocked by this "slur" filter. Very annoying. There is no slur in the link at all.

Who gets to decide what is a slur anyway?

Very disappointed to see this "slur" list being hard-coded and not easily configurable / disabled at run time by the instance owner. And very disappointed by the attitude of the core contributor team here.

I think @seniorm0ment's comment is absolutely right. Let the users define what they want to hide.

Side note for people interested: here is the root of the problem.

dessalines commented 3 years ago

This is done now with #1481

theesfeld commented 1 year ago

why not just have the 'slur filter' be opt-out in settings, with some kind of a TOS that has to be agreed to if the server decides to opt out? releasing 'Lemmy' from anything if they choose to do so?

dessalines commented 1 year ago

This issue was completed a long time ago, the slur filter is entirely optional, you can add one using the config: https://join-lemmy.org/docs/en/administration/configuration.html