alwinb / url-specification

A rephrasing and generalisation of the WHATWG URL Standard
7 stars 0 forks source link

spec disagrees with reference implementation in detecting parser mode #2

Closed ghost closed 3 years ago

ghost commented 3 years ago

Consider this:

import "https://raw.githubusercontent.com/alwinb/url-specification/master/miniurl/miniurl.js"
let {parse} = miniurl
let {raw} = String
console.log(parse(raw`abcdef://yo`, "http"))
console.log(parse(raw`abcdef:\\yo`, "http"))

(I ran it with deno run test.js.)

According to the specification, both of these should be parsed the same way, but the reference implementation parses them differently. I feel like the implementation makes more sense to me, and also matches what the WHATWG spec does.

This is the relevant segment of the spec:

Let magic be the first n characters of input-string, with n being the least of 6 and the length of input-string. The updated-parser-mode is then defined to be:

  • file-mode — if (lowercase magic) :: file : any*
  • web-mode — if (lowercase magic) :: (http | https | ws | wss | ftp) : any*
  • generic-mode — if input :: alpha (alpha | digit | + | - | .)* : any*
  • supplied-mode — otherwise.

Note how magic for that URL string is abcdef, which matches none of the grammar rules provided (since they all require a : to be present).

alwinb commented 3 years ago

Thank you!

I don't like that segment in the specification. I think I'll drop the six character thing and just use – lowercase (input-string) – to test (and then hope people don't actually implement it in that way). It is a clumsy way of specifying a case insensitive pattern match, in any case.

If you have ideas for other ways to specify that, suggestions are welcome :)

I think though it is correct, it matches the third rule. Except that rule itself should state

input-string :: alpha (alpha | digit | + | - | .) : any

instead of

input :: alpha (alpha | digit | + | - | .) : any

ghost commented 3 years ago

I think though it is correct, it matches the third rule

I see! I completely misread it. 😅 I thought it was a check on magic like the other ones above it. Sorry for that. 😳

If you have ideas for other ways to specify that, suggestions are welcome

I think a reasonable way to specify it would be in plain prose, with something along the lines of “web-mode — if input-string starts with a web‐scheme followed by ':'”. (Analogously for file.)

My phrasing doesn’t really specify how to check whether a string “starts with a web‐scheme followed by ':'”, but I don’t think that’s problematic, as it specifies it unambiguously. Figuring out a way to check that is left up to implementors.

The third condition could be something like “if input-string :: scheme ':' any*” (reusing scheme from the “URL Grammar” section).

alwinb commented 3 years ago

I think that's an easy mistake to make given the way it is phrased. It's a trap of sorts, which should be avoided! I like your phrasing, I'm going to try that.

alwinb commented 3 years ago

I think this is an improvement, let me know what you think. Thank you for taking an interest!

I pushed to master, so its live (version 0.4.2).

ghost commented 3 years ago

Yeah, that looks great! 🎉 Thank you for bothering to address my feedback!