alwinb / url-specification

A rephrasing and generalisation of the WHATWG URL Standard
7 stars 0 forks source link

Include codepoint sets for each component and for each standard #15

Open alwinb opened 2 years ago

alwinb commented 2 years ago

Many of the differences between URI, IRI and the WHATWG URL variants (valid, tolerated, sanitised) are about allowing different codepoints to occur verbatim within the various components.

I would like to include a section that contains all the different codepoint sets for each of the relevant components, and then parameterise the grammar.

This goes a long way towards describing the differences between the WHATWG URL standard and RFC3986 and RFC3987, and between the three variants of WHATWG URLs themselves.

The aim is to provide a generalised grammar, and express the various forms of validity across the three standards 'semantically' as constraints on the parse tree and the allowed codepoints within components. There will be a few remaining issues around drive-letters and invalid percent sequences potentially, but other than that I think that this can work.


Steps: