Add moderation to blobs uploaded to a PDS

Bossett commented 4 months ago

Is your feature request related to a problem? Please describe.

As federation progresses, more-and-more people will be hosting data for others, which opens the gates for abuse. It would be good for PDS owners to have an official line of defence against hosting unwanted content.

Describe the solution you'd like

Ideally, a PDS owner should have a means to moderate content during upload (i.e. a hook within uploadBlob) such that unwanted material is never stored. This should take the form of something like a call to a moderation service where this can be handled more completely, but in the interim would be good to implement compatibility with something like Cloudflare's CSAM-scanning tool (https://blog.cloudflare.com/the-csam-scanning-tool/) or similar services.

Describe alternatives you've considered

Managing content post-storage (i.e. filtering in the image proxy) does not address alternate distribution vectors like alternate proxies, or just crawling the repo - something has to fail at time of storage. Similarly, doing this in the app or similar has the same problem in that storage can still happen if using an alternate tool. Ultimately if PDS owners want to be responsible with the content they store, they need to be able to interrupt the upload.

Additional context

This may be something that Bluesky can provide - scanning tools tend to be available to customers of specific products, but if this could be done with a moderation hook, PDS owners could call the official moderation platform to check if data is potentially illegal.

bnewbold commented 4 months ago

Sure, this makes sense! To re-phrase and give some additional context:

content, including blobs, can be retroactively removed from PDS instance storage, using existing API endpoints. these aren't all super well documented, but in a very harmful situation the "takedown account" path is included and documented
but this request is about proactively scanning content before it is ever publicly served by the PDS, at time of upload
as an internal implementation detail, this could happen in the uploadBlob HTTP request itself (hold connection open until scan completes), or as a background task after the blob is uploaded, but before the blob is referenced by any record. currently, blobs which have been uploaded but not (yet) referenced are in a limbo state, and IIRC are not yet publicly accessible (and get deleted after a couple hours or days in limbo)
content which has been flagged via fuzzy matching is not always harmful or violating. the usual process at scale is to have a (well-trained and well-supported!) human review the content, and then either release the content, or escalate further, eg report to authorities. simply rejecting on upload (with no further review or reporting) would be pragmatic and desirable from the perspective of a small PDS operator, but the agencies who provide access to content scanning APIs may or may not be comfortable with this workflow or use case. Can't speak on their behalf! This is an issue that I believe is being discussed and negotiated by smaller ActivityPub instance operators as well; I think there are workable solutions today, and the situation may improve over time.
regardless, there are many potential reasons and mechanisms that a PDS operator might have for scanning blobs, and having a hook would make a lot of sense

Implementation-wise, by "hook" are you imagining a clear place in the PDS implementation to inject code (eg, as a fork), or something like a webhook which forwards on the content to a configured API? If the later, any idea what that API should look like? We have an internal HTTP API with a "clean" (non-proprietary) signature to abstract this sort of thing, which is basically a raw HTTP POST (not form encoded), with extra fields as query parameters.

Bossett commented 4 months ago

My first thought was as a clear place to inject code, but a webhook with the raw data would allow this to be something run by Bluesky or another provider, with the return payload providing some metadata that includes 'store/don't store' guidance. If the return payload is extensible, this may be a place to inject other metadata (labels, or alt-text in a less adversarial context) in the future.

You could kind of get both by having the PDS expose an HTTP API, and just make the default URL http://localhost/xyz.

In either case I think it should be required to complete prior to committing the blob data (i.e. hold that transaction open, and just fail it if the check fails) because that will provide what I think is the more desirable user experience for admins: it just failed to upload, try again if you must (there may need to be some anti-abuse that locks out the account in here, but now we're spiralling). This gives you a spot as well to engage those agencies if they want that flow later - failed fuzzy match fails the upload to the user side, but it can be uploaded on the server side, checked, and if clean it just won't fail in the future (and if not, admin notification, escalation, etc.).

bluesky-social / atproto

Add moderation to blobs uploaded to a PDS #2226