bethylamine commented 7 months ago

Building out of:

30

64

65

66

92

98

115

116

Revamping/improving the reporting system seems like an important direction for Soupcan, since it is now a distinguishing and more critical feature than initially planned.

Make it more like community notes where reports can be rated
Report strengths are taken into account and reports for the same tweet (even from two different people) don't count twice
Make reports only linkable to a tweet or a bio (compulsory), with optional additional context.
Improve ability to detect report abuse. (ideas welcome here.)
Improve the reports view so it's not just a hacked-together giant list, but actually a useful navigable and interactive view.
Improve the appealing process / undo my report feature / approval/disapproval process for mods, so it is more clear what's going on. Can incl. reasons for appealing, and this should also be visible to mods.
Improve the moderation flow to make it faster, more manageable and more transparent (to other mods)
The whole reporting and approval backend process needs to be rewritten. It's unmaintainable.

davidallsopp commented 7 months ago

Given the recent concerns about report abuse / intra-community feuds etc, might be useful to be able to mark accounts that are alleged to be targets of abuse, and prominently warn mods to take extra care, check the whole profile for a balanced evaluation. This might at least help with the problem of out-of-context shitposts being cherry-picked. Could perhaps require more independent reports for such accounts, or multiple mod approval or some other hurdle(s).

bethylamine commented 7 months ago

Adding https://github.com/bethylamine/soupcan/issues/115#issuecomment-2041014222

bethylamine commented 7 months ago

I was thinking that users could vote on whether a report was legitimate/useful/fake/etc., and count that towards the strength of the report. If it's both ways, mark the user as 'controversial', i.e. they may be an ally but also hold transphobic views. It's not clear cut in these cases.

davidallsopp commented 7 months ago

Need to consider potential attacks by bad faith users: a) mass upvoting reports against people they don't like and/or b) downvoting reports against people they support. If voting affects the scoring of the reports, there is incentive for them to try this. If it merely attracts more scrutiny from mods, then it doesn't, hopefully the opposite.

Might need a heuristic to avoid huge numbers of accounts getting marked as "controversial" ("if everyone is controversial then nobody is")?

rougetimelord commented 7 months ago

I feel like something you could implement pretty easily that would be pretty hard to circumvent is a mechanism that holds all reports if too many come in too fast. Kind of like a rate limit, but instead it just sends all of the reports to manual review as a batch, and doesn't score them automatically. I think that could limit the damage caused during dogpiles etc.

You could also add a decay to the punishment for being overturned, where if the report gets overturned quickly the submitting user's trust score gets docked a larger amount than if the report gets overturned days/weeks later. That would disincentivize joining a dogpile, or trying to get a weak report through auto approval.

davidallsopp commented 7 months ago

One problem with a rate limit (especially as we get more users) is that a high rate of reports isn't necessarily a malicious dogpile - could just be genuine outrage at a particularly bad tweet.

At the moment quite a few of the dubious cases seem to have half a dozen very low-trust reports. Could make it so that lots of very low-trust reports (especially if they are all for the same tweet?) aren't actioned automatically even if they add up to a high enough score. Or perhaps add up the top N trust values rather than all of them? But possibly the current scoring is sufficient to hold these back anyway, until boosted by a moderate trust user or mod.

One threat tactic is someone making enough good reports to build up trust, potentially using several accounts, then risking/spending some of that trust to target someone. Takes a lot of effort though, and the target can then appeal, so it's a lot of effort for a (probably very temporary) reward.

rougetimelord commented 7 months ago

One problem with a rate limit (especially as we get more users) is that a high rate of reports isn't necessarily a malicious dogpile - could just be genuine outrage at a particularly bad tweet.

I don't think that's really a problem? It would just mean that moderators would have discretion to decide if a large volume of reports is legitimate or a dogpile. The downside of implementing it would be a delay in an account getting marked, which seems like a better outcome than Soupcan becoming an agent in a dogpile.

In terms of scaling, I feel like there could be an estimate of how many reports are expected per account and having some multiple of that come in within an arbitrary timeframe (for example, half of the database update frequency) triggers the protection. Or it could be based on a percentage (maybe 1%) of the userbase (with a floor of one user) in a timeframe. Kind of just spitballing here though.

CJH87 commented 6 months ago

One threat tactic is someone making enough good reports to build up trust, potentially using several accounts, then risking/spending some of that trust to target someone. Takes a lot of effort though, and the target can then appeal, so it's a lot of effort for a (probably very temporary) reward.

I have lots of ideas, but I'll try to keep this short. It occurs to me that to prevent infiltration, we need to have a set of conditions that an account must pass in addition to the current trust system, in order to acquire certain trust thresholds. I wrote a 1000 word idea, but here it is in 200.

For the purposes of preventing infiltration, is there a trust threshold which should not be attainable without having gained trust from reports concurrent with very high trust users?

When a person reaches a certain trust threshold (I arbitrarily chose the move from 39 to 40 in all my examples) their report activity is scanned to examine the diversity of their 'fellow reporters'. If they fail the test their trust score is somewhat reduced and this check has a 12 hour cooldown.

One very important question I would ask is what proportion of their successful reports were reports where at least one other participant had a trust score above 40%? And has that person had at least three concurrences (successful reports) with different unique >40% users?

Should users have to have had at least a certain number of reports which only succeeded due to moderator particiption in order to pass a certain trust threshold?

Perhaps it's too much work given how robust Beth's trust system is.

I'm assuming the trust system uses some kind of formula analysis of the fellow reporters causing a report which needed 10 people to reach 100% to gain very little trust among each participant. This is not a question.

CJH87 commented 6 months ago

Although I haven't been recently active due to using twitter very little in Feb-April. I've also been unable to access the mod queue for at least 6 weeks. Noticing it became available to me again today.

Back when I was very active I scored 90% of my reports with a score out from 1/10 to 10/10.

Below are the metrics I developed. In part because I often found accounts which I believed were guilty of transphobia but not guilty of hate which exceeded the strict moderation rules.

An illustration of that distinction would be following JKR (2023 version) but not KJK whilst making offhand comments, of course the last 6 months have completely totalled that example. I would search the profile with a presumption of hate on the latter, but not former. Although now either would suffice.

I considered this concept to be a self marking system for moderators, and would only be visible to other moderators. No reason it couldn't bleed influence into other metrics though.

Suspected ‘innocent’ transphobia. Repeatedly transphobic user but no evidence of harm intended towards trans or LGBT people in RT/likes or replies. Moderator may suspect the account user doesn’t know they’re being offensive or transphobic.

Trans seeking trolls. Doesn’t break the TOS/mod guidelines for transphobic content but appears to prioritise seeking or singling out transgender people for mild/trollish interactions. Example: “everything is the fault of the trans”.

With a mind to protecting the Soupcan community, the moderator could not in good conscience appeal the account. Found sufficient inference based (or tacit) “transphobia approval” in the account user to conclude above 50% likelihood of pre-transphobia or (the moderator) infers a the account user is deliberate intent to avoid being openly transphobic leaning but no tweets, RT, or likes of overt transphobia by the account and not generally anti-LGBT.

Account is very clearly hostile to LGBT subject media and openly anti-LGBT but doesn’t mention or distinguish transgender people as a distinct group. If account mentions groomer narratives use 5/10.

Discretionary Report for community safety. Account is clearly anti-LGBT and has anti-trans sentiments espoused repeatedly but moderator not satisfied exceeds Soupcan TOS/guideline thresholds. No consistent RT/likes of transphobic content.

Discretionary Report. Moderator is convinced the account is transphobic but doesn’t feel they can justify it within the rules in under 8 minutes.

Discretionary Report. The moderator takes the account as a whole as dangerous to trans people and the account user prioritises transgender interactions ahead of LGBT interactions: Either more than one reference to LGBT Groomer narratives in tweets/replies, or multiple retweets/likes while generally not being hostile to LGBT people in any other way. OR 1-2 minor signals per moderator guidelines, lots of being mean or singling out transgender media but not rising to a CLEAR CUT violation of the moderation guidelines.

Account frequently likes and retweets transpobic content OR has little to no RT/likes but has extensive tweets and replies, either directly or indirectly antagonistic, of transgender people.

Moderator has complete confidence in identifying this account as transphobe in violation of the Soupcan TOS and Mod Guidelines.

Transphobia is the account’s identity.

I mostly developed this for the purpose of knowing the moderator's sentiment should an account ever be appealed. If I had flagged 9 or 10. I would say the account was irredeemable and should be told tough luck. If 8/10 they should only be able to appeal if the reported content is NOT deleted. For 3 through 7 they would just be re-reviewed and for 1 or 2 their appeal should just be accepted simply by the fact they took the effort to make it.

1-3 - Moderator desperate to not click appeal. Approved flag on gut feeling. 4-5 - Strongly Anti-LGBT with very little singling out of transgender people. 6-7 - Clear TOS/guideline violations lacking, moderator convinced account is transphobic with high confidence. 8-10 – Account violates TOS/guidelines moderator estimates more than 50% of accounts twitter activity is transphobia.

Persons making reports could select from a list of 10 options for the question: How confident are you that this account is transphobic?

I'm a lot more decisive than I was when I wrote this in november though. I don't think more than three options are needed now (in addition to appeal report).

bethylamine / soupcan

New reporting system #126

30

64

65

66

92

98

115

116