Reflect on our `403` vs `404` response codes

slifty commented 5 months ago

There was some discussion in a PR recently about whether we should always return the same status code for entities regardless of whether they don't exist at all, or just "don't exist" based on the permissions of a given user (aka the user is not authorized to access the resource).

This is something we should reach consensus on and implement consistently one way or another.

reefdog commented 5 months ago

Copying over my minor contribution:

403 is semantically sensible (and spec-correct), but the design of showing 404 for private resources so as not to confirm entity presence is established best practice now, AFAIK. (Case in point, hit this URL while logged in: https://github.com/reefdog/PrivateTest)

Our use case is mildly different in that we currently use numeric IDs rather than, say, org-name-based-slugs. But each leaks data:

a 403 for /organizations/200 + a 404 for /organizations/201 tells people we probably have 200 organizations. (Of course, so does our /organizations endpoint's total attribute, but at least we're doing that on purpose.)
a 404 for /organizations/red-cross means we don't have it, but a 403 means we do.

In both these cases, can I imagine a horror-story outcome? No. But the way I think of it: if you are only granted access to chapters 1, 2, and 3 of a book, you should also not have access to the table of contents that reveals chapter titles for the rest of the chapters — unless we explicitly decide we want you to. Then, great. Populate /organizations with the same limited data they could get by brute-forcing the endpoint.

slifty commented 5 months ago

I agree that a more conservative approach is more secure and also fairly standard in my experience (I'm used to this meaning universal 404 rather than universal 403, but I think it COULD go either way?)

Knowing an ID is a real ID feels super mundane, but the thing about exposing mundane data is that I think it possibly could create situations where the presence of a new entity combined with some other kind of third party data might result in some kind of strange opsec edge case. I'd rather just not have to think about those edge cases!

bickelj commented 4 months ago

I'd be OK with the universal 403 for those who lack authorization in both the ID exists and ID does not exist case because it says "you don't have authorization to see whether this ID exists" rather than 404 which says "this ID definitely does not exist." That coupled with the 401 for unauthenticated users should provide plenty of obscurity. If everybody else wants 404 because that's the trend, OK, I suppose, but "everybody else is doing it this way" is not my favorite argument. It is not a hill to die on, for sure, and it can be good to stay on the beaten path when there's nothing wrong with the beaten path. It's only slightly wrong or somewhat ambiguous here.

reefdog commented 4 months ago

While the universal 404 doesn't bother me as it does you, I'm also fine with the universal 403 for entity-specific routes that the user isn't authorized to hit. (Of course, now we're saying /totally-made-up-route is a 404 while /secret-entity-type/1 is a 403, which… now verifies /secret-entity-type exists, but we're getting deep in the weeds of unrealistic problems, probably.)

bickelj commented 4 months ago

The 404 is not bothering me as much having just reviewed 941 where the bulk uploads response would only include what you are allowed to see and nothing else. In that sense, 404 would be consistent with that kind of response for aggregated resources.

reefdog commented 4 months ago

@bickelj Making sure I understand your comment: #941 is about a filter parameter on a collection endpoint, which won't ever (I think?) return 404. Zero matches results in a 200 + empty response. I don't think our approach should be different based on whether the route is a collection or entity route, yeah?

bickelj commented 4 months ago

@reefdog If the results in that collection depend on your authorization, there's a kind of implicit "I didn't find anything else" by the non-presence in the collection, which I'm taking as a kind of 404 for those hidden results. So yeah, given that, I guess I can see that it's OK to return 404 on a single instance rather than a 403. For example, we don't get several 403s or several 404s in the collection, we just don't get them. A single 404 on a "you can't see this" instance is consistent with that. I guess. Kind of. And sorry for the long delayed reply, I just saw this!

bickelj commented 2 months ago

@reefdog @slifty At this point I'm OK with mimicking others on this one or wherever you all landed before my objections : )

PhilanthropyDataCommons / service

Reflect on our `403` vs `404` response codes #940