bluesky-social / proposals

Bluesky proposal discussions
91 stars 9 forks source link

[0002] How to prevent labels being used to target abuse? #19

Open sneakers-the-rat opened 1 year ago

sneakers-the-rat commented 1 year ago

Similar to https://github.com/bluesky-social/proposals/issues/18

Say there's some group A that hates group B. Group A creates a labeling service to algorithmically label all posts and all members of group B. Group A follows this label and uses it to coordinate harassment of members of group B.

This is a distinct abuse vector from creating adversarial custom feeds that do the same thing if

Unlike most content on the AT Protocol, Labeling Services do not host their labels in a repository. This is because the volume of Labels is too high to fit economically into data repositories.

Instead, labels are published using a specific set of APIs which are optimized for streaming and fetching the content. These labels are ingested by various providers and attached to content when they are sent down to the app.

Can you hide your posts from a labeling service? How do blocks work for something that isn't a repository and thus doesn't have a DID? Wouldn't that require additional logic in the big graph services spec to not feed a post to the labeling service? Wouldn't that make federation very weak, since we are dependent on the big graph service? Couldn't there be adversarial big graph services that ignore the requirement to drop blocked labels? How could you block specific big graph services, given that you can't know what feeds and what graph services the people that you want to see your posts are using?

An independent concern is the abuse of DMCA violation labels. As is currently the case on platforms like YouTube, IP holders (and IP trolls) issue high volumes of takedown requests, including for noninfringing content. Since (in the US) hosts are not responsible for illegal content posted by users, but do have to respond to DMCA complaints, how would you avoid the problem of smaller servers that are incapable of keeping up with the moderation demand of floods of adversarial DMCA violation labels being run off the network? It seems like an easy way to get a rival server shut down, and ensures that only large servers can survive, again weakening federation.

bnewbold commented 1 year ago

It is true that people could use labeling to facilitate abuse. It is already possible to do that sort of thing using, eg, a google spreadsheet or discord chat room to track and target accounts or posts, and it would naive to think that folks won't try to use this in harmful ways. I think the important question is whether there is something about labeling that makes such abuse and harassment meaningfully more harmful. There is similar concern with account lists, and thinking through whether the benefits outweigh the potential downsides.

All content in atproto/bluesky is currently public, and it is not possible to entirely hide content from other parties. Things like blocks can add a meaningful degree of friction to seeing and finding content, but that is just friction: it is possible to just log out or use another account to evade the block.

Allowing people to "block" or evade labeling services would make it harder for legitimate labeling services to moderate content in the network.

sneakers-the-rat commented 1 year ago

So this:

I think the important question is whether there is something about labeling that makes such abuse and harassment meaningfully more harmful.

and this:

Allowing people to "block" or evade labeling services would make it harder for legitimate labeling services to moderate content in the network.

are why I wrote this:

This is a distinct abuse vector from creating adversarial custom feeds that do the same thing

and this:

Can you hide your posts from a labeling service? How do blocks work for something that isn't a repository and thus doesn't have a DID? Wouldn't that require additional logic in the big graph services spec to not feed a post to the labeling service?

and also this:

An independent concern is the abuse of DMCA violation labels.

to specifically address the additional potential for abuse above and beyond lists or off-protocol techniques.

Labels being unblockable and with infrastructural support for mass dissemination and subscription makes them a dramatically more dangerous vector for abuse than in or out of protocol lists. There is no way to moderate your way out of that either - you can't add additional labels to accounts that use them for abuse because identity is so cheap, and anyone could be subscribing to a particular set of labels (eg. a person in targeted group B may want to subscribe to the labels to see how they are being targeted).

The same concerns make it difficult to see how labels would meaningfully deter or prevent other kinds of abuse, since no matter how many you compose together, the legitimate abuse labelers will always be in an algorithmic arms race at best, and again we see how well that works in other social networks.

The DMCA abuse thing is also unique to having a label that logically represents DMCA violations in the network. eg. this story from just today : https://torrentfreak.com/publishers-carpet-bomb-ipfs-gateway-operators-with-dmca-notices-230625/ where IPFS gateway hosts are being bombarded with DMCA complaints even when they host no infringing content. Having an extremely low friction, high signal method of issuing arbitrary numbers of DMCA takedown notices would make the network unusable, so it's a very real question how this implementation plans to avoid the problems experienced by every other platform and protocol on the web. this draft also appears far more vulnerable to these kinds of attacks because rather than a centralized arbiter (as in corporate platforms) or the high friction of needing to contact individual hosts (as in the fediverse, matrix, etc.) this draft suggests than any labeling service would be able to issue DMCA complaints. So you not only have multiple actors issuing the complaints, but multiple vectors by which they can be delivered. The effect would be the same as everywhere else: it becomes effectively impossible to be a smaller host that can't afford the legal time and money to address them, making federation meaningless. Additionally, it becomes a weapon for adversarial actors to get servers they don't like shut down akin to swatting. This probably deserves its own issue

Yes, part of what I'm getting at here is that if everything being public and it being impossible to have meaningful blocking and privacy features is indeed an active decision, rather than a byproduct of the protocol design, then the potential for abuse is substantial - even network breaking - and will manifest in different ways with each additional feature while making other desirable features hard or impossible to implement without effectively centralizing the system ( see https://github.com/bluesky-social/proposals/issues/18 ) So while I get where the idea of compostable moderation is coming from, this isn't a criticism of that so much as it's particular (sketch of) implementation against the backdrop of ATproto. There is still time to fix these problems while the federation spec is still in draft, but it will require significantly rethinking it, unless I'm missing something major, which is possible.

agentjabsco commented 1 year ago

on the subject of mitigating DMCA spam, I think the solution there is going to have to be something along the lines of a centralized list of arbiters administered by the state (akin to a service like FedNow) that make it possible to take legal action against malactors (ie. filing a complaint with the FCC/FTC). hard, but not impossible

agentjabsco commented 1 year ago

as for labels in the general case: are labels not signed by the labeling party? that seems like the most apparent solution to me: I see that a post was marked with a label by the Bluesky Auto-Labeler and/or the Bluesky Trust & Safety Review Board, and if I (as a PDS/BGS administrator) start seeing that some party's auto-labeler/review board is being used as a vector for abuse, I can make the unilateral decision for my BGS to drop labels with the characteristic abuse (and contact the labeling party ie. via email, in cases where I believe said labeler is acting in good-enough faith that they can take action against the abuse being laundered through them).

sneakers-the-rat commented 1 year ago

I can make the unilateral decision for my BGS to drop labels with the characteristic abuse

Right. so the BGS is required to become a centralized point of moderation, which is sort of the problem.

It also doesn't address the case where they are used to target abuse, where the problem isn't the person whose post is labeled seeing the label, but malicious actors being able to see the label. That means that any answer has to prevent a BGS from indexing their post in the first place. That's difficult as proposed, since an individual PDS wouldn't have any good way of knowing which BGS they should block a priori, because they wouldn't know which BGS is the source for which feed generator/etc. potentially used by their friends or anyone they might want to see their posts

So you have the situation where either a) you have to prespecify one or a small number of BGS providers to allowlist and block all others - the only really viable option there being the one main BGS operated by bluesky, again because of the combinatorics problems between BGS and feed generators/app views. That means that the BGS takes on the role of central moderator, meaning that federation is extremely weak and you get all the problems of the platform web just kicked up a level. b) you are in a (losing) arms race where you need to proactively block BGS providers that serve bad actors, while the bad actors continuously try and evade the "good" BGS, an even worse moderation situation for a given BGS to be in. This would also make the feed generator system basically impossible to use, since everyone would end up with some swiss cheese blocklists and none of the BGS's would actually serve their purpose of global indexing.

Given that the whole purpose of this labeling system is to take moderation out of the hands of some central platform holders, BGS moderation doesn't really seem like an answer

NetWalker108 commented 12 months ago

I can make the unilateral decision for my BGS to drop labels with the characteristic abuse

so the BGS is required to become a centralized point of moderation, which is sort of the problem.

Conventionally, this shouldn't be the case for a BGS as it's not meant to handle labels or moderation tasks. See Federation Architecture.

@agentjabsco suggestion is a custom arrangement.

sneakers-the-rat commented 12 months ago

right. again what I'm saying is that the feature is intrinsically abusable and the only ways to remedy that all seem undesirable (eg. making the BGS a point of moderation, which it isn't designed to be and shouldn't be)

NetWalker108 commented 12 months ago

every technology is a double-edged sword. Labellers are subscribed to by users, and therefore users have every option to (un)subscribe to Label A or Label B, and suggest their peers do the same.

if a troll makes a Labeller for malicious reasons and literally nobody subscribes to it, it has no power.

the problem isn't the labeller/tech, but the due diligence done by users prior to subscribing to any labeller, feed, mute list, etc., beyond just word-of-mouth, social engineering, or lavish marketing.

sneakers-the-rat commented 12 months ago

See the adversarial scenario at the beginning of the issue:

Say there's some group A that hates group B. Group A creates a labeling service to algorithmically label all posts and all members of group B. Group A follows this label and uses it to coordinate harassment of members of group B.

the "marketplace of ideas" doesn't solve coordinated harassment campaigns. we're not talking about otherwise well-meaning users accidentally subscribing to a troll feed, but hateful users using a hateful labeling feed to target hate.

NetWalker108 commented 12 months ago

For Group A's labelling service to have any impact on Group B, that service will have to be widely adopted by the masses outside of Group A's set, which includes users in Group B's set and outside the sets of both groups.

Given Group B's set is the potential victim here, it's unlikely for them to subscribe to Group A's service which just leaves everyone else outside both groups. This means Group A's service must work very well to be subscribed to by people outside either set.

For real world context: the labelling services of Group A and Group B are like the new Fox News and CNN, neither of their audience will subscribe to the other. And if either group wanted dominate to cause (adversarial) issues, they'd have to find the BBC or Al-Jazeera of labelling services.

That's as coordinated as it gets, unless of course, the external masses fail to do due diligence prior to subscribing labelling services, then we've got bigger problems.

sneakers-the-rat commented 12 months ago

That's as coordinated as it gets

That is extremely far from true, and that isn't how harassment works online. Very small numbers of hateful users can and do cause massive problems for vulnerable groups.

You're fundamentally misunderstanding the risk here - again, the problem is not the labels being used as a mass subscription service. The problem is the labeling system used to target harrassment by a potentially small number of users towards a vulnerable group of users. The number of people that subscribe to the labeling system is effectively irrelevant, as is whether a mass of uninvolved people "fail to do their due diligence."

NetWalker108 commented 11 months ago

I am in no way trying to explain how harassment works online. I am explaining the constraints involved in the use of labelling services, and that is the scope of my discussion. There isn't and wouldn't be one singular labelling service or authority over labelling services, there will be many as anyone will have every opportunity to spin up their own.

The concern of how a super minority uses a labelling service is not the concern of the labelling service stack itself. It's like being concerned about how a super minority will use the HTTP.

With that said, you're free to pursue this interest how ever you think you can(be it socially, economically, etc), but it's fundamentally inconsequential to the neutral nature of the tech. Unless, of course, you're building your own labelling service spec + standards, and ATProto fork, and gearing it towards some level of ecosystem-wide adoption, then good luck.

sneakers-the-rat commented 11 months ago

it's fundamentally inconsequential to the neutral nature of the tech.

thinking that technology is neutral is sort of all i need to know about this conversation.

It's like being concerned about how a super minority will use the HTTP.

not even close to an apt comparison - this is a social and communication protocol so abuse is a core design constraint. I have no interest in building for a protocol that is disinterested in safety in its design because a bunch of silicon valley smoothbrains think that technology is neutral lmao

pfrazee commented 11 months ago

I apologize if this question is dense, but how are labeling services uniquely suited to harassment? If we can get very specific about this, we can get more specific about mitigations.

In the general technical sense, labeling services dont do anything unique -- they just publish URLs with labels tagging them. This is something any system can do.

In the common-affordance sense, labeling services should cause warnings and filters to be applied to accounts and content.

"Common-affordance" is an important consideration beyond the "general technical" because it lowers the friction of coordinated actions. In the simplest sense, if a standard system for publishing URLs with labels is made widely available, it's much more likely to be used. The ensuing behaviors also matter: if a view is supplied in the applications to watch the publication of labels, it then becomes an even simpler tool to watch labels and coordinate harassment. If we could get specific about enumerating the concerns, we could evaluate the system as a whole and the potential mitigations.

sneakers-the-rat commented 11 months ago

Not a dense question! it's the central concern of the issue :)

Abuse scenario

The specific abuse scenario I am worried about with labels is that they will be used to coordinate abuse. As in OP, say a hate group A runs a labeling service to label all posts about/from targeted group B so that members of group A and affiliated hate group(s) can more efficiently find, harrass, dox, etc. members of targeted group B.

Note that this is distinct from the idea of a "public" labeling service intended for mass consumption, or that the members of group B might see and be harmed by the labels in themselves - the threat here is specifically that a potentially small number of users would use the affordances of the platform for more efficient and coordinated abuse. In all the following we are specifically talking about abusive actors that might not run well-behaving labeling services that do things like eg. ignoring blocks when possible.

Let's discuss the potential for DMCA abuse in a separate issue

Features that facilitate risk

The current draft proposal has several parts that structure this scenario:

That makes them distinct from eg. a custom feed for targeting abuse, as the labels would be harder/impossible to block, could be used combinatorically since multiple labels could apply to a given post (eg. find me posts from members of groups B and C that discuss topic D)

In-protocol vs. out of protocol

Of course it is possible to index posts out of protocol. An in-protocol labeling system is distinct in a few ways:

Mitigations

I'd love to hear your perspective on mitigations, to me it seems like any meaningful mitigation would require some means of blocking labeling services from receiving a user's posts, ideally allowing opt-in to specific labeling services, and that means making it possible to enforce blocks at a network, rather than a client level. As far as I know that would require a substantial reworking of the federation architecture to include post encryption since one might run an adversarial BGS that ignores blocks even if the BGS were to require auth to serve a user's posts. That of course becomes very complicated to balance with the intended function of the labeling services, which to some degree are nonconsensual by design so harmful content can be labeled.

Countering adversarial labeling with more labeling doesn't seem like a mitigation to me, nor does restricting the possible labels. An adversarial labeler could always just repurpose seemingly-benign labels so that the consumer knows that the "profane" tag from labeling service A means that it is labeling a post from group B - these kind of shibboleths are a common MO for hate groups operating semi-publicly.

Anyway, appreciate considering abuse risk for these proposals, I definitely recognize that these are all challenging things to weigh and design.

Invertex commented 5 months ago

Perhaps to help mitigate some of these issues, a voting system could be in place?

Instead of labels being explicitly approved on posts, there could be not just moderators and admins of a labeling system but "approved voters", a group able to grow organically over time as people prove themselves as good actors in a community. "Approved voters" would not be able to explicitly approve labels, but they can alter a "validity" weighting with their up or downvote. A low enough rating can simply removed the suggested labeling from the system (and prevent future suggesting), avoiding undue burden on management.

There could be a feedback system from this as well, whereby users whose proposed labels get consistently downvoted, will have future proposed labels negatively weighted automatically, or barred entirely from suggesting labels anymore.

What kind of labels can be proposed to a labeling system for a post/user should be preset by that labeling system, so maybe one labeling system only has labels for CSAM, GORE, SA, another has labels just for SPORTS, POLITICS, NEWS. These are the only 3 labels people can propose on a post for these systems, and any further attempts are rejected by the fact a proposal for that label exists already, leaving it up to "Approved Voters" to upvote it or managers to approve.

Labeling systems could allow approved voters of the community to potentially vote on new preset labels to add to the community for use as well.

You could potentially even do-away with "approval" all together and allow people to filter based on "upvote %" for a given label, so people can set thresholds in that manner, and weight Admin/Mod votes at a much higher ratio so they can more quickly "approve" a label with their upvote in the eyes of anyone with a reasonable upvote threshold.