alwinb / url-specification

A rephrasing and generalisation of the WHATWG URL Standard
7 stars 0 forks source link

Grammatical ambiguity for abc:def #8

Open TimothyGu opened 3 years ago

TimothyGu commented 3 years ago

"abc:def" can be parsed in two ways: abc as scheme and def as path, and abc:def as path.

We could resolve it through the grammar itself, which is what RFC 3986 does with its path-noscheme production. But the simplistic approach (forbidding colons in the first path segment) would forbid abc,def:123 from being parsed as a path, contrary to what browsers and the WHATWG Standard do.

Alternatively, we could just handwave it and say scheme always wins the fight. This has the advantage of keeping the grammar simple.

alwinb commented 3 years ago

I was implicitly assuming the 'first rule takes all', but you are right, that should be made explicit then.

I think there is another one.

Ah yes, //foo could be parsed as (path-root /) (dir ε) (file foo), but it shouldn't.

We could resolve it through the grammar itself, which is what RFC 3986 does with its path-noscheme production. But the simplistic approach (forbidding colons in the first path segment) would forbid abc,def:123 from being parsed as a path, contrary to what browsers and the WHATWG Standard do.

That's quite sharp. Hmm. I need to have a better look at this.

Alternatively, we could just handwave it and say scheme always wins the fight. This has the advantage of keeping the grammar simple.

I think I prefer that approach.

Meanwhile there is the job of making adjustments to the grammar from RFC3987 to align with WHATWG URL, which seems so close now. (cc @masinter)

I'm stalling a bit in this area. It is a bit under constrained design-wise. I find it hard to pick one equivalent solution over the other if there's no clear reason to prefer one over the other (If I make sense), leading to indecision.