Proposal 0003: A Financially Self-Sustaining Judicial System Model for Moderation

enn-nafnlaus commented 1 year ago

Based on @pfrazee 's comment here:

https://github.com/bluesky-social/proposals/issues/25

And several comments in the comments section, it seems that there's interest in a judicial system model for moderation. A proposal:

Premises

A moderation system should be financially self-sustainable.
Moderators deserve to be paid for their work and not exploited. We will for the sake of argument assume a net overhead of $15/h for moderation work.
Everyone should have a right to appeal and a jury of their peers.
Good actors should not bear costs. Bad actors should bear costs.

The User Layer

Users write posts. If one or more people reports a post for moderation (with or without a threshold), or it is reported by an auto-moderation tool, such as (all optional): A) Image recognition (porn flagging, etc) B) Simple keyword searches C) Smart context-inclusive searches (LLM) D) ... etc

... then we have two options we can decide among, depending on how it was reported:

1A: Immediate flagging of the post, under the respective rule that it was flagged for 2B: Passing through the Base Moderation Layer.

At this point, no meaningful costs have been incurred, except for the server costs to run any automated tools.

The Base Moderation Layer

Here we have two options.

2A: A human moderation layer, OR 2B: An automated context-inclusive analysis (e.g. a LLM set to an automation task)

Option 2A immediately incurs human compensation costs which must be borne, while option 2B only incurs minor costs (server costs). However, option 2B must be able to demonstrate a minimum threshold reliability rate compared to a human (for example, that a "reasonable person" would not disagree with it more than, say, 5% of the time) to be viable; this reliability should be proven to user satisfaction before this option would be employed.

If we assume that a mean of 20 seconds is needed per report under option 2A (human moderation), then the cost of moderation is around eleven cents per report.

Given the premise that the "bad actor" should bear the costs, then reports determined to be false should be borne by the reporter (analogous to a determination of frivolous litigation), while reports determined to be true still need to be borne. We have two options:

2O1: Immediately charging the party whose post was determined to be abusive 2O2: Collecting the fee spread out among other "court costs" (e.g. tacking it onto appeals fees, frivolous report charges, etc, as a general pool of funding)

While option 2O1 has some arguments toward fairness, it would also require that everyone have a payment means set up just to be able to post on the site, which may discourage participation. Hence, I would tend to favour 2O2, a general pool funded by general "court fees". The "bad actor" here is still being punished in that their post is being flagged, and many can be expected to appeal. However, we can introduce a third option:

2O3: People determined to be habitual bad actors (an unusually high percentage of their posts being affirmed by moderation as being violations) must bear the costs immediately, and must thus have a payment method installed just to be able to post, while the vast majority of users do not.

... which might be even better.

Now, given the right to appeal (particularly important if option 2B - an automated base moderation layer - is employed rather than a human one):

The Court of Appeals

This gives users - either those who were determined to have filed a frivolous report, or those who were determined to have committed a violation with their post, a chance to appeal the decision. To file, they must prepay the full court costs, plus pay into the general pool. If they win, they're reimbursed, and the loser must pay the court costs. If the defendant is not willing to risk the cost of losing on their appeal, they can cede the case to the appellant, the appellant's money is returned to them, and the appeal is decided in their favour. If multiple people filed a report, they all split the risk equally.

Since this is designed to be a swift, cheap human decision, one might propose a structure such as 5 randomly-selected jury members, each expected to spend an average of 90 seconds reading over the case, making a decision, and writing a quick one-sentence justification. Since jurors - like anyone involved in the moderation process - should expect to be paid for their work (with the option to decline their pay and have it donated to Bluesky's operation costs if they want!), an appeals trial should be expected to cost $2,20 (plus any extra for the general pool).

The winning party should be reimbursed for all their costs (if any) up to that point.

Jury selection should be random among users but voluntary. For efficiency,, it might be wise to have selected jurors review several cases in a row.

Because jury summonses can't be expected to respond immediately (some people only log in once daily), they should not be tied to a specific case, but simply that the number of summons sent out corresponds to how many appeals cases are in the queue, and whenever they respond, they're assigned the top cases in the queue. A summons should expire after a fixed period of time so that they're not accumulating on inactive accounts.

The Bluesky Supreme Court

This gives users who lost their appeal one final chance. This would be a much larger pool, to reduce any random selection bias (say, 13 jurists), allow for written arguments made by both parties, with jurists expected to take at least e.g. 5 minutes reviewing the case and writing at least one paragraph in support of their view. The costs of such an appeal would thus be expected to be around $21, plus any pool costs. Jurists would be selected in the same manner as for the appeals court, and likewise, it'd be a case of "loser bears the costs", with the right of the defendant to cede the case in advance of the hearing.

Example #1

Alice sees a post from Bob, where Bob writes that women are stupid and nobody should listen to them. Alice files a report under a given category. In this scenario, the first moderation layer is automated, so it costs Alice nothing to file her report. The automation layer agrees with Alice and determines Bob was in violation, and flags his post. At this point, nobody incurs any costs.

Bob doesn't think what he wrote was in violation and files an appeal, paying $2,20 in court costs and $0,30 in pool costs for general judicial overhead, for a total of $2,50. Alice gets a notice of the appeal. She's entirely convinced she's correct, and indeed, in the meantime, four other people reported the post, so at the worst she's on the hook for fifty cents if she loses. The appeals court decides in her favour. Bob's post is still banned, and now he's down $2,50.

Facing a string of defeats thusfar and thus a very likely defeat in the Supreme Court, Bob weighs his options, decides that with $21 in court costs + $4 in pool costs to appeal to the Supreme Court, he has better things to do with his money and lets his loss stand.

Example #2

Carol sees a post from Dave where he wrote "Your response is stupid, and I frankly think you have a mental disorder." Carol is mad and files a report under a given category. In this scenario, the first moderation layer is human, so she has to pay $0,11 in review costs + $0,05 in pool costs = $0,15 to file her report.

The moderator quickly reviewing the case agrees with Carol. Dave isn't a habitual offender, so the cost for moderation comes from the pool. Dave thinks he was maybe rude but not in violation of any rules, so he appeals and pays $2,50. Carol, confident from her earlier win, agrees. The appeals court however sides with Dave. Carol is now on the hook for the $2,50 + the initial $0,15 moderation costs.

Carol appeals to the Supreme Court, and puts down $25. Dave agrees to the hearing. After deliberation of the circumstances in the arguments laid out by both sides, the Supreme Court sides with Carol. Carol is now reimbursed for her $25 + $2,50 + $0,15, and Dave bears the full $27,65 instead. The flag on Dave's post stands.

Attacks

Since being flagged by a given moderation layer affects all the users of that layer, but not people who don't use that layer, then juries should be selected from among users of that moderation layer. There is, however, the risk of attempting to mob a given moderation layer to try to be to be chosen for jury selection on a contentious issue, or even Sybil attacks (fake identity user accounts created en masse).

For a start, it might be advisable to only select jurors who have been continuously using the moderation layer, and indeed, filtering by the rule flagged, over at least X period of time. That is, the moderation layer would save periodic snapshots of who is using it at any given point in time and filtering on what rules, which it can later use in deciding jury selection.

For more complex attacks, there are many options that could be taken, but honestly I would argue that trying to overautomate against them is a bad thing. Often these sorts of cases are examples of "I Know It When I See It", and that there should just be a general rule that, "In the case of strong evidence of attempts to rig the judicial system, administrators will step in and take action against the perpetrators." Real-world judicial systems rely on human interpretations of intent rather than overautomation, and it's a strong defense against attacks.

Feedback

Feedback is more than welcomed. I find Paul's notion of a judicial system to be an appealing one, and this document is just a first draft attempt at how to translate that into a viable moderation system which can economically sustain itself.

agentjabsco commented 1 year ago

this is a lot like like the "market-based solution" @hdevalence proposed a few days ago https://bsky.app/profile/hdevalence.bsky.social/post/3jzai4woy3f2t

there's a bunch of problems to using the moderation system itself as a monetization strategy, the first and most obvious one off the top of my head: it financially incentivizes the mods to make decisions that will in the long run lead to more moderatable conflict, so they can keep getting paid to solve it (the "Wally putting more bugs into the program after they announce he gets a bonus for each one he fixes" problem, aka the cobra effect)

for all its damnable externalities, this is the upshot of advertising-oriented funding models like Twitter's. if you're going to pay for moderation, the income to pay for it should come from something tied to things that brings in more money when people are doing things that make people want to be there.

the other downside, and this is where this issue starts to become overtly political, is that this has a chilling effect on engagement in that the rich can afford to deal with "moderation-prone" statements, but the poor cannot. this is what I mean when I say "you might want to check out Twitter"—the culture and social ecosystem that this inevitably leads to is the most toxic cloud of ambient assholery the Internet can conjure.

under federation, some other site can go ahead and try this, and you can see what happens—my hunch is, you'll wind up having a site where stuff like this happens at a record-breaking velocity, assuming your platform ever gets traction at all.

amitai commented 1 year ago

This is an absolutely amazing post.

bluesky-social / proposals