LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.12k stars 865 forks source link

Automoderation #3277

Closed boehs closed 6 months ago

boehs commented 1 year ago

The problem

I moderate a large community, asklemmy. Given it's size and it's name, new users frequently ask support questions, despite a pinned post and rules against it. Sampling recent posts, about 1/3rd are removed under the support rule. What's more, a majority of the posts end up being duplicates.

All this is a massive load on us, and lots of tedious, duplicated work. This is one of the many uses for automoderation!

Proposed solution

Reddit offers AutoModerator, a tool which allows moderators to declare simple rules to preform moderation actions automatically. You can read about it here: https://www.reddit.com/wiki/automoderator/full-documentation/

It's a complex beast and difficult to work with for moderators, but well worth it. I propose something similar.

Scripting

The best imaginable solution is allowing moderators and admins to write simple scripts that are fired (with an input) on certain events, and can take certain actions as a result. This allows for an easier time than automoderator, an easier implementation, and just a better experience overall. A full implementation is quite complex, and can perhaps never be satisfied, but I will define first steps below

The proposal

https://rhai.rs/ is a scripting language that can be embedded within rust programs. Scripting languages typically have fears about vulnerabilities (why rlua is not a candidate). I've specifically proposed rhai, as it has a lot of thought put into security and sandboxing. It allows for tons of control over the environment that scripts are granted, as well as monitoring.

Another appealing alternative is https://rune-rs.github.io/, which has an appealing syntax but a smaller community.

The flow

This is a spec for my first proposed version:

Moderators can define a number of scripts that are run on certain events:

Both posts and comments have a global variable:

post = {
  title: string,
  body: Option<string>,
  time: number,
  url: Option<String>,
  language: string,
  nsfw: boolean,
  author: {
    isMod: boolean
    isAdmin: boolean
    username: string
    ....
  }
}
isEdit: boolean

a similar variable is offered for comments. The script can call a number of methods:

Each moderation function can be called exactly once. If prevent is called, no other function will be performed. Functions return immediately, and are infallible.

Moderators may test their scripts by crafting their own comments/posts in a sandbox.

A runtime limit should be imposed. This limit requires thought and testing. An initial suggestion is 0.5s runtime and 2mb memory. See additional consideration.

A similar product is cloudflare snippets, just announced, which act as a replacement for rules: https://blog.cloudflare.com/cloudflare-snippets-alpha/.

They have a maximum execution time of 5ms, a maximum memory of 2MB, and a total package size of 32KB. These limits are more than sufficient for common use cases like modifying HTTP headers, rewriting URLs, and routing traffic

Additional consideration

What common operations does automod preform?
  1. Check for existence of string in (title|body)
  2. Check for one of many strings in (title|body)
  3. Match regex against (title|body)!!!!

The following three sections are strictly for rahi

What needs to be added to rhai (if we go with that)
  1. Built in ✅
  2. Trivial to do (just as any programming language) ✅
  3. Not built in!!! Package must be written.

The following packages maybe should be included:

See builtin packages: https://rhai.rs/book/rust/packages/builtin.html

And optional features: https://rhai.rs/book/start/features.html

What needs to be removed
  1. Runtime limits must be decided on
  2. See https://rhai.rs/book/start/features.html
  3. Can print() be disabled?
  4. no_module & no_custom_syntax can both likely be enabled (maybe. see how regex impl goes and how prelude works)
Can runtime limits even be imposed??

DANGER DANGER DANGER PLAN FALLING APART MAYBE???

I wish I could say so. Rhai doesn't explain anywhere how to limit execution time or memory usage (though they are both listed here), but they do have many solutions to constrain programs, including https://rhai.rs/book/safety/max-operations.html

One operation can take an unspecified amount of time and real CPU cycles, depending on the particulars

I don't like this. I've opened an issue here: https://github.com/rhaiscript/rhai/issues/730, For rune, see this discussion I started: https://github.com/rune-rs/rune/issues/569, which brings clarity to the situation in rust.

For now I generally only recommend running Rune on people's machines (like in their browser with WASM). Eventually when memory restrictions lands, sandboxing will be a center stage feature.

Regardless of what implementation, i'd like to have the developers review the implementation for security

Who's done this before

Network services that use it, notably:

(list sourced, in part, via https://github.com/rhaiscript/rhai/issues/69)

Describe alternatives you've considered.

Why not DIY modbots

  1. Requires trust of bot creator
  2. More expensive - bot must be hosted separately at creator's expense and interface over the network at the expense of both.
  3. The inbuilt feature can enable additional features, like the prevent
  4. Implementing this externally is a similar, if not slightly greater, effort.

Additional context

Discussion is needed about if this is a good idea, if it can be done securely, and how it should be done. I'm bullish on all 3, if you couldn't tell

I can implement this myself if there is support

Revisions 1. Added `Why not DIY modbots` to post 2. Add `Additional consideration`
udoprog commented 1 year ago

So to butt in, there's in my mind two clear options for you today:

In the more extreme case, Rune can run in WASM. But since there are already languages targeting WASM directly I personally don't think it makes much sense to put a runtime in a runtime like that. You end up leaving a lot of performance on the table.

001Guy001 commented 1 year ago

I second the request for an automoderator but is there a reason you chose to use a different language than YAML? (edit: never mind, you were already a part of a conversation about this on Thoughts on automated moderation tooling #3281)

As a non-programmer/non-developer I've found YAML to be way easier to understand and implement compared to JSON which seems similar to the language you're using.

Additionally, communities that already have an existing automod configuration would simply be able to use it as is (maybe other than rule with recently added checks like email verification/subreddit karma/etc.), instead of having to learn a new language/format and having to painstakingly convert the existing code to a new one.

p.s. adding a link to a backup of the automod code (outdated but could be useful for recreating it)

erlend-sh commented 1 year ago

3281 covers automoderation in more detail.

This post is mainly about plugin systems (which I love!), and as such it might be better off as a continuation of #3562

Nutomic commented 6 months ago

LemmyAutomod exists now.