fediverse-devnet / feditest-tests-fediverse

The tests for the fediverse testsuite
MIT License
5 stars 4 forks source link

WebFinger tests: percent encoded? #57

Closed steve-bate closed 1 month ago

steve-bate commented 1 month ago

Can you clarify how webfinger.server.4_2__4_do_not_accept_malformed_resource_parameters::not_percent_encoded is working?

When I print malformed_webfinger_uri, it looks like a valid webfinger URI. The 200 status code is what I'd expect.

jernst commented 1 month ago

According to Section 4.1, a request is malformed if the resource is not percent-encoded. So HTTP GET on:

https://example.com/.well-known/webfinger?resource=acct:user@example.com

should return 400 because the correct request is

https://example.com/.well-known/webfinger?resource=acct%3auser%40example.com
steve-bate commented 1 month ago

According to Section 4.1, a request is malformed if the resource is not percent-encoded.

My understanding is that percent-encoding is only required when a character is part of the URI "reserved" character set.

RFC 3986 (emphasis mine):

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

The "@" character, for example, is allowed so it wouldn't need to be percent-encoded.

jernst commented 1 month ago

I took the @ -> %40 and : -> %3a directly from the examples in the WebFinger RFC, assuming that if their examples encode them, there must be a reason. However, the RFC editors may have been overzealous:

According to RFC 3986 section 3 Syntax Components:

URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

According to section 3.4 Query:

query       = *( pchar / "/" / "?" )

According to section 3.3 Path:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

On the other hand, this RFC doesn't actually specify the ?key=value syntax. When looking that up, I come across https://url.spec.whatwg.org/#application/x-www-form-urlencoded which makes my brain hurt.

Suggest a compromise: we 1) require that servers accept non-percent-encoded : and @ as I cannot see how it gets in the way of interop and they do appear to be allowed and 2) permit clients to not %-encode them.

steve-bate commented 1 month ago

which makes my brain hurt.

Likewise. ;-)

Suggest a compromise: we 1) require that servers accept non-percent-encoded : and @ as I cannot see how it gets in the way of interop and they do appear to be allowed and 2) permit clients to not %-encode them.

Am I reading Section 4.1 correctly? ... that "=" and "&" are the only unreserved characters that must be percent-encoded in the query? That makes sense given they are delimiters for the query params. I think if we want to test percent-encoding we need to find or create an actor with reserved characters in its user or domain name.

EDIT: A domain name can't have an "=" or "&" so the user name would be the part that might have those characters. Out of more than 180,000 user names recorded in my Mastodon instance, none of them have those characters, but it's theoretically possible.

jernst commented 1 month ago

I removed that test now. So can we close this?

steve-bate commented 1 month ago

Thanks.