Proposal: more granular control over unauthenticated API access

WesleyAC commented 1 year ago

Pitch

DISALLOW_UNAUTHENTICATED_API_ACCESS used to be something useful, but with changes in v4, it breaks the web frontend for non-authenticated users, so most admins are unwilling to use it. I think it's worth building more tools to control the amount of information exposed by the API to non-authenticated users, which in combination with AUTHORIZED_FETCH could have significant impacts on security and harassment mitigation.

In particular, I recently made the following extremely simple patch on interlace.space:

Hide old posts from non-authed users

```diff diff --git a/app/models/account_statuses_filter.rb b/app/models/account_statuses_filter.rb index 556aee032..5427d2b16 100644 --- a/app/models/account_statuses_filter.rb +++ b/app/models/account_statuses_filter.rb @@ -35,7 +35,7 @@ class AccountStatusesFilter if suspended? Status.none elsif anonymous? - account.statuses.not_local_only.where(visibility: %i(public unlisted)) + account.statuses.not_local_only.where(visibility: %i(public unlisted), created_at: (DateTime.now - 14.day)..(DateTime.now)) elsif author? account.statuses.all # NOTE: #merge! does not work without the #all elsif blocked? ```

This hides posts older than two weeks from being visible on the account page for unauthenticated users. This provides stronger protection against scraping than the old DISALLOW_UNAUTHENTICATED_API_ACCESS did (since there is no way to get the old toots without knowing their ID), while still allowing permalinks to individual posts to work. It also is a good solution for people who like the idea of hiding old toots, but don't want to go all the way to auto-deleting them.

I'd like to flesh this out into a more fully-fledged system that provides more granular privacy options for the levels of API access provided to unauthenticated users, and I wanted to open this issue to ask what people would like to see here.

Notably, this doesn't do as much as one might like to prevent harassment while Mastodon allows non-authenticated users to view remote users, since it's easy for people to, for example, open up someone's profile on mastodon.social or some other server like that, so I think a important component of this would be making some changes in upstream Mastodon as well so that it stops leaking so much information, and instead defaults to redirecting non-authenticated users to the canonical URL for user pages, for instance. I'm not sure how friendly upstream is to that — I see fixing that as fixing a bug, but I don't know if that's how upstream sees it. However, even in the absence of that change, allowing people more control over how the API of their own instance is used seems good and important.

Overall, I think that the approach GoToSocial and Honk take (trying to avoid being a vector for allowing block evasion and scraping of other instances' posts) is good, and I'd like to see Mastodon adopt more of that model. I think that, in combination with locked accounts and secure mode, is a really good framework to allow people to control the spread of their posts.

Stuff that I see as being a part of this:

Option to limit age of posts shown to unauthenticated users on profile pages (Ideally both a per-user and per-instance option)
Option to limit age of posts shown to unauthenticated users on timelines
Option to limit age of posts shown to unauthenticated (including direct links to posts)
Option to hide unlisted posts in the web UI
Option to limit search access for unauthenticated users (is there already a way to do this? I don't see a setting for it)

Related stuff that already exists:

Hiding timelines for unauthenticated users
Hiding follower / following list (and follower count)
Enabling / disabling profile directory

Are there other things that people would like to see as part of a system like this, or broader thoughts that people have on this topic? I'd love to hear from instance admins and users what information they would like to hide and show to unauthenticated users looking at the web UI (or attempting to scrape via the API)

Motivation

See above.

Plastikmensch commented 1 year ago

I remember seeing a upstream PR which basically implemented a limit for how many toots are shown to unauthenticated users, but it needed an update and I don't know what happened to it.

Option to limit search access for unauthenticated users (is there already a way to do this? I don't see a setting for it)

That is actually pretty easy to implement.

Option to limit age of posts shown to unauthenticated (including direct links to posts)

I'm not sure including direct links is wise.

Option to hide unlisted posts in the web UI

I'm not sure about this.

I'm for this, but I'm not sure how much it would actually achieve against scraping, especially for smaller instances.

WesleyAC commented 1 year ago

[Search authentication] is actually pretty easy to implement.

Hah, yeah, I just deployed that on interlace.space a bit after writing this. Would require a slight amount of work to make it a env variable (or even better, setting), but it's quite easy overall. Probably worth trying to upstream as well.

For limiting direct links and unlisted posts, I'm curious what seems unwise about that — my hope would be for it to be a alternative to DISALLOW_UNAUTHENTICATED_API_ACCESS, but I can see how it might be confusing to communicate what's happening — a 403 is quite clear, whereas selectively hiding content is less so.

Plastikmensch commented 1 year ago

For limiting direct links and unlisted posts, I'm curious what seems unwise about that

I feel that can be confusing from a user perspective. Imagine sharing a link to your toot on another platform for example and it leads to a 403 or some other page not containing the toot. For unlisted toots it would somewhat mimic the behaviour of followers-only toots, but instead of followers-only it is logged-in only.

glitch-soc / mastodon

Proposal: more granular control over unauthenticated API access #2225

Pitch

Motivation