[0002] How to prevent labels being used to target abuse?

bluesky-social / proposals

Bluesky proposal discussions

96 stars 10 forks source link

[0002] How to prevent labels being used to target abuse? #19

Open sneakers-the-rat opened 1 year ago

sneakers-the-rat commented 1 year ago

Say there's some group A that hates group B. Group A creates a labeling service to algorithmically label all posts and all members of group B. Group A follows this label and uses it to coordinate harassment of members of group B.

This is a distinct abuse vector from creating adversarial custom feeds that do the same thing if

Unlike most content on the AT Protocol, Labeling Services do not host their labels in a repository. This is because the volume of Labels is too high to fit economically into data repositories.

Instead, labels are published using a specific set of APIs which are optimized for streaming and fetching the content. These labels are ingested by various providers and attached to content when they are sent down to the app.

Can you hide your posts from a labeling service? How do blocks work for something that isn't a repository and thus doesn't have a DID? Wouldn't that require additional logic in the big graph services spec to not feed a post to the labeling service? Wouldn't that make federation very weak, since we are dependent on the big graph service? Couldn't there be adversarial big graph services that ignore the requirement to drop blocked labels? How could you block specific big graph services, given that you can't know what feeds and what graph services the people that you want to see your posts are using?

An independent concern is the abuse of DMCA violation labels. As is currently the case on platforms like YouTube, IP holders (and IP trolls) issue high volumes of takedown requests, including for noninfringing content. Since (in the US) hosts are not responsible for illegal content posted by users, but do have to respond to DMCA complaints, how would you avoid the problem of smaller servers that are incapable of keeping up with the moderation demand of floods of adversarial DMCA violation labels being run off the network? It seems like an easy way to get a rival server shut down, and ensures that only large servers can survive, again weakening federation.

bnewbold commented 1 year ago

It is true that people could use labeling to facilitate abuse. It is already possible to do that sort of thing using, eg, a google spreadsheet or discord chat room to track and target accounts or posts, and it would naive to think that folks won't try to use this in harmful ways. I think the important question is whether there is something about labeling that makes such abuse and harassment meaningfully more harmful. There is similar concern with account lists, and thinking through whether the benefits outweigh the potential downsides.

All content in atproto/bluesky is currently public, and it is not possible to entirely hide content from other parties. Things like blocks can add a meaningful degree of friction to seeing and finding content, but that is just friction: it is possible to just log out or use another account to evade the block.

Allowing people to "block" or evade labeling services would make it harder for legitimate labeling services to moderate content in the network.

sneakers-the-rat commented 1 year ago

So this:

I think the important question is whether there is something about labeling that makes such abuse and harassment meaningfully more harmful.

and this:

Allowing people to "block" or evade labeling services would make it harder for legitimate labeling services to moderate content in the network.

are why I wrote this:

This is a distinct abuse vector from creating adversarial custom feeds that do the same thing

and this:

Can you hide your posts from a labeling service? How do blocks work for something that isn't a repository and thus doesn't have a DID? Wouldn't that require additional logic in the big graph services spec to not feed a post to the labeling service?

and also this:

An independent concern is the abuse of DMCA violation labels.

to specifically address the additional potential for abuse above and beyond lists or off-protocol techniques.

Labels being unblockable and with infrastructural support for mass dissemination and subscription makes them a dramatically more dangerous vector for abuse than in or out of protocol lists. There is no way to moderate your way out of that either - you can't add additional labels to accounts that use them for abuse because identity is so cheap, and anyone could be subscribing to a particular set of labels (eg. a person in targeted group B may want to subscribe to the labels to see how they are being targeted).

The same concerns make it difficult to see how labels would meaningfully deter or prevent other kinds of abuse, since no matter how many you compose together, the legitimate abuse labelers will always be in an algorithmic arms race at best, and again we see how well that works in other social networks.

The DMCA abuse thing is also unique to having a label that logically represents DMCA violations in the network. eg. this story from just today : https://torrentfreak.com/publishers-carpet-bomb-ipfs-gateway-operators-with-dmca-notices-230625/ where IPFS gateway hosts are being bombarded with DMCA complaints even when they host no infringing content. Having an extremely low friction, high signal method of issuing arbitrary numbers of DMCA takedown notices would make the network unusable, so it's a very real question how this implementation plans to avoid the problems experienced by every other platform and protocol on the web. this draft also appears far more vulnerable to these kinds of attacks because rather than a centralized arbiter (as in corporate platforms) or the high friction of needing to contact individual hosts (as in the fediverse, matrix, etc.) this draft suggests than any labeling service would be able to issue DMCA complaints. So you not only have multiple actors issuing the complaints, but multiple vectors by which they can be delivered. The effect would be the same as everywhere else: it becomes effectively impossible to be a smaller host that can't afford the legal time and money to address them, making federation meaningless. Additionally, it becomes a weapon for adversarial actors to get servers they don't like shut down akin to swatting. This probably deserves its own issue

Yes, part of what I'm getting at here is that if everything being public and it being impossible to have meaningful blocking and privacy features is indeed an active decision, rather than a byproduct of the protocol design, then the potential for abuse is substantial - even network breaking - and will manifest in different ways with each additional feature while making other desirable features hard or impossible to implement without effectively centralizing the system ( see https://github.com/bluesky-social/proposals/issues/18 ) So while I get where the idea of compostable moderation is coming from, this isn't a criticism of that so much as it's particular (sketch of) implementation against the backdrop of ATproto. There is still time to fix these problems while the federation spec is still in draft, but it will require significantly rethinking it, unless I'm missing something major, which is possible.

agentjabsco commented 1 year ago

on the subject of mitigating DMCA spam, I think the solution there is going to have to be something along the lines of a centralized list of arbiters administered by the state (akin to a service like FedNow) that make it possible to take legal action against malactors (ie. filing a complaint with the FCC/FTC). hard, but not impossible

agentjabsco commented 1 year ago

as for labels in the general case: are labels not signed by the labeling party? that seems like the most apparent solution to me: I see that a post was marked with a label by the Bluesky Auto-Labeler and/or the Bluesky Trust & Safety Review Board, and if I (as a PDS/BGS administrator) start seeing that some party's auto-labeler/review board is being used as a vector for abuse, I can make the unilateral decision for my BGS to drop labels with the characteristic abuse (and contact the labeling party ie. via email, in cases where I believe said labeler is acting in good-enough faith that they can take action against the abuse being laundered through them).

sneakers-the-rat commented 1 year ago

I can make the unilateral decision for my BGS to drop labels with the characteristic abuse

Right. so the BGS is required to become a centralized point of moderation, which is sort of the problem.

It also doesn't address the case where they are used to target abuse, where the problem isn't the person whose post is labeled seeing the label, but malicious actors being able to see the label. That means that any answer has to prevent a BGS from indexing their post in the first place. That's difficult as proposed, since an individual PDS wouldn't have any good way of knowing which BGS they should block a priori, because they wouldn't know which BGS is the source for which feed generator/etc. potentially used by their friends or anyone they might want to see their posts

So you have the situation where either a) you have to prespecify one or a small number of BGS providers to allowlist and block all others - the only really viable option there being the one main BGS operated by bluesky, again because of the combinatorics problems between BGS and feed generators/app views. That means that the BGS takes on the role of central moderator, meaning that federation is extremely weak and you get all the problems of the platform web just kicked up a level. b) you are in a (losing) arms race where you need to proactively block BGS providers that serve bad actors, while the bad actors continuously try and evade the "good" BGS, an even worse moderation situation for a given BGS to be in. This would also make the feed generator system basically impossible to use, since everyone would end up with some swiss cheese blocklists and none of the BGS's would actually serve their purpose of global indexing.

Given that the whole purpose of this labeling system is to take moderation out of the hands of some central platform holders, BGS moderation doesn't really seem like an answer

NetWalker108 commented 1 year ago

I can make the unilateral decision for my BGS to drop labels with the characteristic abuse

so the BGS is required to become a centralized point of moderation, which is sort of the problem.

Conventionally, this shouldn't be the case for a BGS as it's not meant to handle labels or moderation tasks. See Federation Architecture.

@agentjabsco suggestion is a custom arrangement.

sneakers-the-rat commented 1 year ago

right. again what I'm saying is that the feature is intrinsically abusable and the only ways to remedy that all seem undesirable (eg. making the BGS a point of moderation, which it isn't designed to be and shouldn't be)

NetWalker108 commented 1 year ago

every technology is a double-edged sword. Labellers are subscribed to by users, and therefore users have every option to (un)subscribe to Label A or Label B, and suggest their peers do the same.

if a troll makes a Labeller for malicious reasons and literally nobody subscribes to it, it has no power.

the problem isn't the labeller/tech, but the due diligence done by users prior to subscribing to any labeller, feed, mute list, etc., beyond just word-of-mouth, social engineering, or lavish marketing.

sneakers-the-rat commented 1 year ago

See the adversarial scenario at the beginning of the issue:

Say there's some group A that hates group B. Group A creates a labeling service to algorithmically label all posts and all members of group B. Group A follows this label and uses it to coordinate harassment of members of group B.

the "marketplace of ideas" doesn't solve coordinated harassment campaigns. we're not talking about otherwise well-meaning users accidentally subscribing to a troll feed, but hateful users using a hateful labeling feed to target hate.

NetWalker108 commented 1 year ago

For Group A's labelling service to have any impact on Group B, that service will have to be widely adopted by the masses outside of Group A's set, which includes users in Group B's set and outside the sets of both groups.

Given Group B's set is the potential victim here, it's unlikely for them to subscribe to Group A's service which just leaves everyone else outside both groups. This means Group A's service must work very well to be subscribed to by people outside either set.

For real world context: the labelling services of Group A and Group B are like the new Fox News and CNN, neither of their audience will subscribe to the other. And if either group wanted dominate to cause (adversarial) issues, they'd have to find the BBC or Al-Jazeera of labelling services.

That's as coordinated as it gets, unless of course, the external masses fail to do due diligence prior to subscribing labelling services, then we've got bigger problems.

sneakers-the-rat commented 1 year ago

That's as coordinated as it gets

That is extremely far from true, and that isn't how harassment works online. Very small numbers of hateful users can and do cause massive problems for vulnerable groups.

You're fundamentally misunderstanding the risk here - again, the problem is not the labels being used as a mass subscription service. The problem is the labeling system used to target harrassment by a potentially small number of users towards a vulnerable group of users. The number of people that subscribe to the labeling system is effectively irrelevant, as is whether a mass of uninvolved people "fail to do their due diligence."

NetWalker108 commented 1 year ago

I am in no way trying to explain how harassment works online. I am explaining the constraints involved in the use of labelling services, and that is the scope of my discussion. There isn't and wouldn't be one singular labelling service or authority over labelling services, there will be many as anyone will have every opportunity to spin up their own.

The concern of how a super minority uses a labelling service is not the concern of the labelling service stack itself. It's like being concerned about how a super minority will use the HTTP.

With that said, you're free to pursue this interest how ever you think you can(be it socially, economically, etc), but it's fundamentally inconsequential to the neutral nature of the tech. Unless, of course, you're building your own labelling service spec + standards, and ATProto fork, and gearing it towards some level of ecosystem-wide adoption, then good luck.

sneakers-the-rat commented 1 year ago

it's fundamentally inconsequential to the neutral nature of the tech.

thinking that technology is neutral is sort of all i need to know about this conversation.

It's like being concerned about how a super minority will use the HTTP.

not even close to an apt comparison - this is a social and communication protocol so abuse is a core design constraint. I have no interest in building for a protocol that is disinterested in safety in its design because a bunch of silicon valley smoothbrains think that technology is neutral lmao

pfrazee commented 1 year ago

I apologize if this question is dense, but how are labeling services uniquely suited to harassment? If we can get very specific about this, we can get more specific about mitigations.

In the general technical sense, labeling services dont do anything unique -- they just publish URLs with labels tagging them. This is something any system can do.

In the common-affordance sense, labeling services should cause warnings and filters to be applied to accounts and content.

"Common-affordance" is an important consideration beyond the "general technical" because it lowers the friction of coordinated actions. In the simplest sense, if a standard system for publishing URLs with labels is made widely available, it's much more likely to be used. The ensuing behaviors also matter: if a view is supplied in the applications to watch the publication of labels, it then becomes an even simpler tool to watch labels and coordinate harassment. If we could get specific about enumerating the concerns, we could evaluate the system as a whole and the potential mitigations.

sneakers-the-rat commented 1 year ago

Not a dense question! it's the central concern of the issue :)

Abuse scenario

The specific abuse scenario I am worried about with labels is that they will be used to coordinate abuse. As in OP, say a hate group A runs a labeling service to label all posts about/from targeted group B so that members of group A and affiliated hate group(s) can more efficiently find, harrass, dox, etc. members of targeted group B.

Note that this is distinct from the idea of a "public" labeling service intended for mass consumption, or that the members of group B might see and be harmed by the labels in themselves - the threat here is specifically that a potentially small number of users would use the affordances of the platform for more efficient and coordinated abuse. In all the following we are specifically talking about abusive actors that might not run well-behaving labeling services that do things like eg. ignoring blocks when possible.

Let's discuss the potential for DMCA abuse in a separate issue

Features that facilitate risk

The current draft proposal has several parts that structure this scenario:

Centrally, it seems to be impossible to prevent a labeling service from seeing your posts. This is a byproduct of the more general BGS federation architecture, so not unique to labels, but...
Labeling services do not operate as repositories, so presumably they won't have the same kind of identity model as repositories? The blocking system is already cosmetic (the blocker is not shown interactions from the blocked account, but the blocked account can still trivially see the blocking account), but labeling services wouldn't even have that level of protection - eg. if the BGS was expanded to not feed posts to blocked feed generators, if labeling systems don't have an identity, would that be possible?
Labeling services would have a number of in-protocol means of interaction and consumption, including being able to report a given post/account to a specific labeling service, as well as set up actions for labels (eg. hide, but again presuming ill-behaved services and clients this could be more expansive)

That makes them distinct from eg. a custom feed for targeting abuse, as the labels would be harder/impossible to block, could be used combinatorically since multiple labels could apply to a given post (eg. find me posts from members of groups B and C that discuss topic D)

In-protocol vs. out of protocol

Of course it is possible to index posts out of protocol. An in-protocol labeling system is distinct in a few ways:

Friction: most obviously, having the facility to subscribe to labeling services and report posts from within a protocol substantially reduces the friction for abuse.
Collaborative abuse: Lowered friction could be more important than just a difference in degree - the creation of a "workflow of abuse" that is common across harassing groups would only really be possible with an in-protocol affordance. Being able to "mix and match" labeling services could be a point of overlap between hate groups, and could be used to power additional downstream services like feed generators and bots - eg. one could imagine a "doxxed" label that's applied to every post from someone who has been doxxed, and a bot then comes by and posts the personally identifiable information beneath their posts. The doxxed user might block the account, but their posts would still be visible to people who don't, and also again the potential for adversarial clients evades any client-level protection that might exist. An additional nonconsensual abuse vector might include labeling accounts of people who eg. do or have done sex work under an alias unlinked to their public identity, or people who aren't publicly out as queer, and so on.
Label Poisoning: one of the critical features of an abusive content labeling system is that it has to be hard to evade labels - if someone could just block the CSAM label then it would cease to be effective. The other side of that is that those labels then become abuse vectors. Presumably there would be some difference between repo-level label subscriptions and instance-level subscriptions, and some means of 'pushing' labels rather than being subscribe-only (so eg. the instance can receive independent information about presence of CSAM/etc. even if they didn't subscribe to a detection service beforehand). So hate group A could mass issue CSAM notices for targeted accounts/posts - that becomes especially heinous in a federated context where now you have a potentially large number of moderation teams that have to review all the posts to unfreeze them.

Mitigations

I'd love to hear your perspective on mitigations, to me it seems like any meaningful mitigation would require some means of blocking labeling services from receiving a user's posts, ideally allowing opt-in to specific labeling services, and that means making it possible to enforce blocks at a network, rather than a client level. As far as I know that would require a substantial reworking of the federation architecture to include post encryption since one might run an adversarial BGS that ignores blocks even if the BGS were to require auth to serve a user's posts. That of course becomes very complicated to balance with the intended function of the labeling services, which to some degree are nonconsensual by design so harmful content can be labeled.

Countering adversarial labeling with more labeling doesn't seem like a mitigation to me, nor does restricting the possible labels. An adversarial labeler could always just repurpose seemingly-benign labels so that the consumer knows that the "profane" tag from labeling service A means that it is labeling a post from group B - these kind of shibboleths are a common MO for hate groups operating semi-publicly.

Anyway, appreciate considering abuse risk for these proposals, I definitely recognize that these are all challenging things to weigh and design.

Invertex commented 9 months ago

Perhaps to help mitigate some of these issues, a voting system could be in place?

Instead of labels being explicitly approved on posts, there could be not just moderators and admins of a labeling system but "approved voters", a group able to grow organically over time as people prove themselves as good actors in a community. "Approved voters" would not be able to explicitly approve labels, but they can alter a "validity" weighting with their up or downvote. A low enough rating can simply removed the suggested labeling from the system (and prevent future suggesting), avoiding undue burden on management.

There could be a feedback system from this as well, whereby users whose proposed labels get consistently downvoted, will have future proposed labels negatively weighted automatically, or barred entirely from suggesting labels anymore.

What kind of labels can be proposed to a labeling system for a post/user should be preset by that labeling system, so maybe one labeling system only has labels for CSAM, GORE, SA, another has labels just for SPORTS, POLITICS, NEWS. These are the only 3 labels people can propose on a post for these systems, and any further attempts are rejected by the fact a proposal for that label exists already, leaving it up to "Approved Voters" to upvote it or managers to approve.

Labeling systems could allow approved voters of the community to potentially vote on new preset labels to add to the community for use as well.

You could potentially even do-away with "approval" all together and allow people to filter based on "upvote %" for a given label, so people can set thresholds in that manner, and weight Admin/Mod votes at a much higher ratio so they can more quickly "approve" a label with their upvote in the eyes of anyone with a reasonable upvote threshold.

Martmists-GH commented 2 weeks ago

Instead of labels being explicitly approved on posts, there could be not just moderators and admins of a labeling system but "approved voters", a group able to grow organically over time as people prove themselves as good actors in a community.

I'm a bit concerned with how this "prove themselves as good actors" would work. In the first place, some labels (like AI-related ones) carry a weight that encourages conflict. If a lot of people would report an account as being AI-related, there's a good chance it'll get generalized into "This person hates artists" rather than "This person thinks some parts of AI are neat". Even if we go by upvote%, it's more likely people would vote for a person to get such a label than disagree with it, depending on the reach of the accounts, quote posts, etc. Such a method assumes an equal amount of opponents and proponents of the label view the account, which is inherently unlikely by the nature of people following like-minded users.

Even for important labels, such as suggestive/NSFW, this should be a concern, as what is considered "suggestive" may vary. There's also the concern for situations like this: User A posts something NSFW once or twice a year, and correctly tags it as such. However, some automatic labeller takes this post and applies it to the entire user. As a result, that entire user's feed may be filtered out by those who only don't want to see NSFW posts. Just because there are facets of an account you disagree with, doesn't mean you should remove the entire account from your feed. This kind of thing is exactly what causes echo chambers.

For Group A's labelling service to have any impact on Group B, that service will have to be widely adopted by the masses outside of Group A's set, which includes users in Group B's set and outside the sets of both groups.

This phrasing in itself is a big problem. People are not in either Group A or Group B. Most labels currently in use are not binary. I could be fully on the side of Group A, but think Group B does have a point in one or two things. Depending on the labeller, that might be enough to label me as Group B, and get me isolated from my peers in Group A. This would obviously go against the point of labels, and is why I believe they should be (largely) self-assigned.

On top of that, the current label system is complex to use. If I want to find accounts with label X, I need to check dozens, sometimes hundreds of labellers who have an X label. Instead it would be much simpler for all users, if tags were not only self-assigned, but global by name rather than by labeller+name. That way I can just search for e.g. has-label:X and get all users who think label X applies to them. Abuse of these tags can then always be reported to the bsky moderation team to deal with appropriately.

The biggest problem is that, as it is right now, many of these tags are NOT self-assigned. Accounts are at the mercy of opinions of third parties, often with no way to contact them or no way to appeal them properly. There are no guidelines on getting/avoiding a label, many labels are based on subjective matters, and often encourage generalization. Ideally, a labeller should only be able to label accounts that are subscribed to it, with users able to remove the label if they disagree. If you wish to just get rid of users who have a certain opinion on subject B, blocklists are a valid use-case. That way, you can also view accounts on a case-by-case basis to check if the account maintaining the list is trustworthy or not, rather than putting everyone with just a slight indication of an opinion that disagrees with the "common" opinion you agree with (see the Group A vs Group B example from earlier).

Considering many of these labels are mostly used to hide posts, rather than for discovering like-minded accounts, it actively discourages constructive criticism, and encourages collective disdain against certain groups labels would apply to. It could also act as a way groups can promote anti-intellectualism and similar ideas given the right circumstances.

On top of all of this, As far as I can tell there's no way to view what labels have been put on your account to disagree with in the first place. The only tool I've found is this one I found in a blog post, and if I'm not a tech-savvy user, I'd have no way to check if this tool is reliable, or actively filtering out labels I'd argue against if they disagree with the tool's author.

Martmists-GH commented 1 week ago

I should also mention that there is no way to stop labels either; You can't block labellers you disagree with, you just get a popup saying "Blocking will not prevent labels from being applied on your account".

Also important: The EU Digital Services Act requires that users have the right to be notified and motivated of a restriction on their account and that there must be an effective point of contact available to the user. Currently, as labels are used as "block markers" by certain groups, they could possibly be considered restrictions, and users are currently unaware of any labels on their accounts (unless they use an external tool as mentioned before). Additionally, as labellers are community-ran, there's no guarantee of a point of contact being available if the labeller has gone inactive.

Invertex commented 1 week ago

I'm a bit concerned with how this "prove themselves as good actors" would work. In the first place, some labels (like AI-related ones) carry a weight that encourages conflict. If a lot of people would report an account as being AI-related, there's a good chance it'll get generalized into "This person hates artists" rather than "This person thinks some parts of AI are neat". Even if we go by upvote%, it's more likely people would vote for a person to get such a label than disagree with it, depending on the reach of the accounts, quote posts, etc. Such a method assumes an equal amount of opponents and proponents of the label view the account, which is inherently unlikely by the nature of people following like-minded users.

My vision when writing that is from my experience with Twitter's "Community Notes" system. Where users have a dynamic "impact rating" that grows when they voted in the direction that the result ultimately went, so people who vote to apply a note (label) even tho it's decently clear it shouldn't have one, end up with a decreased impact on future votes, resulting in a more natural trend towards people who consider the information having more of a say. So while majority rule can happen, it ends up having to be a significant majority unless people with high impact also vote that way.

I don't fully understand the internals of how their system works, though I believe the details and math is public. Nor am I saying it's a perfect system either, but it could be an improvement.

I don't know if there's really a perfect solution to this, since so much if it does simply come down to personal judgements.