Open bethylamine opened 7 months ago
Given the recent concerns about report abuse / intra-community feuds etc, might be useful to be able to mark accounts that are alleged to be targets of abuse, and prominently warn mods to take extra care, check the whole profile for a balanced evaluation. This might at least help with the problem of out-of-context shitposts being cherry-picked. Could perhaps require more independent reports for such accounts, or multiple mod approval or some other hurdle(s).
I was thinking that users could vote on whether a report was legitimate/useful/fake/etc., and count that towards the strength of the report. If it's both ways, mark the user as 'controversial', i.e. they may be an ally but also hold transphobic views. It's not clear cut in these cases.
Need to consider potential attacks by bad faith users: a) mass upvoting reports against people they don't like and/or b) downvoting reports against people they support. If voting affects the scoring of the reports, there is incentive for them to try this. If it merely attracts more scrutiny from mods, then it doesn't, hopefully the opposite.
Might need a heuristic to avoid huge numbers of accounts getting marked as "controversial" ("if everyone is controversial then nobody is")?
I feel like something you could implement pretty easily that would be pretty hard to circumvent is a mechanism that holds all reports if too many come in too fast. Kind of like a rate limit, but instead it just sends all of the reports to manual review as a batch, and doesn't score them automatically. I think that could limit the damage caused during dogpiles etc.
You could also add a decay to the punishment for being overturned, where if the report gets overturned quickly the submitting user's trust score gets docked a larger amount than if the report gets overturned days/weeks later. That would disincentivize joining a dogpile, or trying to get a weak report through auto approval.
One problem with a rate limit (especially as we get more users) is that a high rate of reports isn't necessarily a malicious dogpile - could just be genuine outrage at a particularly bad tweet.
At the moment quite a few of the dubious cases seem to have half a dozen very low-trust reports. Could make it so that lots of very low-trust reports (especially if they are all for the same tweet?) aren't actioned automatically even if they add up to a high enough score. Or perhaps add up the top N trust values rather than all of them? But possibly the current scoring is sufficient to hold these back anyway, until boosted by a moderate trust user or mod.
One threat tactic is someone making enough good reports to build up trust, potentially using several accounts, then risking/spending some of that trust to target someone. Takes a lot of effort though, and the target can then appeal, so it's a lot of effort for a (probably very temporary) reward.
One problem with a rate limit (especially as we get more users) is that a high rate of reports isn't necessarily a malicious dogpile - could just be genuine outrage at a particularly bad tweet.
I don't think that's really a problem? It would just mean that moderators would have discretion to decide if a large volume of reports is legitimate or a dogpile. The downside of implementing it would be a delay in an account getting marked, which seems like a better outcome than Soupcan becoming an agent in a dogpile.
In terms of scaling, I feel like there could be an estimate of how many reports are expected per account and having some multiple of that come in within an arbitrary timeframe (for example, half of the database update frequency) triggers the protection. Or it could be based on a percentage (maybe 1%) of the userbase (with a floor of one user) in a timeframe. Kind of just spitballing here though.
One threat tactic is someone making enough good reports to build up trust, potentially using several accounts, then risking/spending some of that trust to target someone. Takes a lot of effort though, and the target can then appeal, so it's a lot of effort for a (probably very temporary) reward.
I have lots of ideas, but I'll try to keep this short. It occurs to me that to prevent infiltration, we need to have a set of conditions that an account must pass in addition to the current trust system, in order to acquire certain trust thresholds. I wrote a 1000 word idea, but here it is in 200.
For the purposes of preventing infiltration, is there a trust threshold which should not be attainable without having gained trust from reports concurrent with very high trust users?
When a person reaches a certain trust threshold (I arbitrarily chose the move from 39 to 40 in all my examples) their report activity is scanned to examine the diversity of their 'fellow reporters'. If they fail the test their trust score is somewhat reduced and this check has a 12 hour cooldown.
One very important question I would ask is what proportion of their successful reports were reports where at least one other participant had a trust score above 40%? And has that person had at least three concurrences (successful reports) with different unique >40% users?
Should users have to have had at least a certain number of reports which only succeeded due to moderator particiption in order to pass a certain trust threshold?
Perhaps it's too much work given how robust Beth's trust system is.
I'm assuming the trust system uses some kind of formula analysis of the fellow reporters causing a report which needed 10 people to reach 100% to gain very little trust among each participant. This is not a question.
Although I haven't been recently active due to using twitter very little in Feb-April. I've also been unable to access the mod queue for at least 6 weeks. Noticing it became available to me again today.
Back when I was very active I scored 90% of my reports with a score out from 1/10 to 10/10.
Below are the metrics I developed. In part because I often found accounts which I believed were guilty of transphobia but not guilty of hate which exceeded the strict moderation rules.
An illustration of that distinction would be following JKR (2023 version) but not KJK whilst making offhand comments, of course the last 6 months have completely totalled that example. I would search the profile with a presumption of hate on the latter, but not former. Although now either would suffice.
I considered this concept to be a self marking system for moderators, and would only be visible to other moderators. No reason it couldn't bleed influence into other metrics though.
- Suspected ‘innocent’ transphobia. Repeatedly transphobic user but no evidence of harm intended towards trans or LGBT people in RT/likes or replies. Moderator may suspect the account user doesn’t know they’re being offensive or transphobic.
- Trans seeking trolls. Doesn’t break the TOS/mod guidelines for transphobic content but appears to prioritise seeking or singling out transgender people for mild/trollish interactions. Example: “everything is the fault of the trans”.
- With a mind to protecting the Soupcan community, the moderator could not in good conscience appeal the account. Found sufficient inference based (or tacit) “transphobia approval” in the account user to conclude above 50% likelihood of pre-transphobia or (the moderator) infers a the account user is deliberate intent to avoid being openly transphobic leaning but no tweets, RT, or likes of overt transphobia by the account and not generally anti-LGBT.
- Account is very clearly hostile to LGBT subject media and openly anti-LGBT but doesn’t mention or distinguish transgender people as a distinct group. If account mentions groomer narratives use 5/10.
- Discretionary Report for community safety. Account is clearly anti-LGBT and has anti-trans sentiments espoused repeatedly but moderator not satisfied exceeds Soupcan TOS/guideline thresholds. No consistent RT/likes of transphobic content.
- Discretionary Report. Moderator is convinced the account is transphobic but doesn’t feel they can justify it within the rules in under 8 minutes.
- Discretionary Report. The moderator takes the account as a whole as dangerous to trans people and the account user prioritises transgender interactions ahead of LGBT interactions: Either more than one reference to LGBT Groomer narratives in tweets/replies, or multiple retweets/likes while generally not being hostile to LGBT people in any other way. OR 1-2 minor signals per moderator guidelines, lots of being mean or singling out transgender media but not rising to a CLEAR CUT violation of the moderation guidelines.
- Account frequently likes and retweets transpobic content OR has little to no RT/likes but has extensive tweets and replies, either directly or indirectly antagonistic, of transgender people.
- Moderator has complete confidence in identifying this account as transphobe in violation of the Soupcan TOS and Mod Guidelines.
- Transphobia is the account’s identity.
I mostly developed this for the purpose of knowing the moderator's sentiment should an account ever be appealed. If I had flagged 9 or 10. I would say the account was irredeemable and should be told tough luck. If 8/10 they should only be able to appeal if the reported content is NOT deleted. For 3 through 7 they would just be re-reviewed and for 1 or 2 their appeal should just be accepted simply by the fact they took the effort to make it.
1-3 - Moderator desperate to not click appeal. Approved flag on gut feeling. 4-5 - Strongly Anti-LGBT with very little singling out of transgender people. 6-7 - Clear TOS/guideline violations lacking, moderator convinced account is transphobic with high confidence. 8-10 – Account violates TOS/guidelines moderator estimates more than 50% of accounts twitter activity is transphobia.
Persons making reports could select from a list of 10 options for the question: How confident are you that this account is transphobic?
I'm a lot more decisive than I was when I wrote this in november though. I don't think more than three options are needed now (in addition to appeal report).
Building out of:
30
64
65
66
92
98
115
116
Revamping/improving the reporting system seems like an important direction for Soupcan, since it is now a distinguishing and more critical feature than initially planned.