Should the processing model care about validity?

WICG / scroll-to-text-fragment

Proposal to allow specifying a text snippet in a URL fragment

Other

586 stars 43 forks source link

Should the processing model care about validity? #221

Closed annevk closed 8 months ago

annevk commented 1 year ago

It currently has

If fragment directive input is not a valid fragment directive, then return an empty list.

but can that actually happen? It seems better if we don't need these additional grammar-based validation requirements.

bokand commented 1 year ago

Hmm, are you suggesting to remove the grammar definitions and rely on the parsing and processing steps to handle errors?

That makes sense to me, though personally I'd find the grammar useful so readers can tell at a glance how to construct valid text directives (or other fragment directives); maybe keep that as a non-normative reference?

annevk commented 1 year ago

Yeah, I think a dual nature makes sense. One to guide producers and the other to guide consumers. URL and the HTML syntax also have this dual nature.

bokand commented 1 year ago

One additional point, I see that for FragmentDirective it doesn't matter since an invalid string will just be an UnknownDirective. But for TextDirectives we do use the grammar to discard invalid cases (e.g. a URL includes a fifth, invalid, term after suffix). That seems convenient to have a grammar? But I'll see if it's easy to just do algorithmically.

annevk commented 1 year ago

Usually the problem is that implementers don't use grammars in their implementation so you get subtle differences. I've found a lot of these over the years in HTTP header parsers.

They also had quite a bit of complexity when a string iterator can do the job as well.

bokand commented 1 year ago

Yeah, that's true, but then I have a bit of a meta question about spec writing :) ...I recently came across this note in parallelism:

Algorithms in standards are to be easy to understand and are not necessarily great for battery life or performance.

How do we balance this vs. writing steps that match implementation? Blink also doesn't use a grammar for this case but this seemed like the best way to convey the requirements in spec; implementors can validate however they choose but the spec wins in case of any deviations (does HTML follow something similar to C++'s as-if rule?).

Is this typically a judgement call, with preference to match implementation?

As another example, the core find a range algorithm in this spec no longer mechanically matches what Blink does (though the output should be the same). Blink rearranged the steps to enable asynchronous execution but it's much harder to understand than the spec steps. Presumably that's reasonable (and common?)

annevk commented 1 year ago

Yeah, see https://infra.spec.whatwg.org/#algorithm-conformance for a description of that rule in standards land. (I had not seen the C++ rule before today though and there may well be differences.)

In my experience implementers prefer a set of steps over a grammar. And a set of steps tends to lead towards better tests and more interoperable implementations. But yes, ultimately it's a bit of a judgment call and people might have different preferences.

Now we already have algorithms for splitting on code points and such written down so I think for the case at hand here building on that would be very straightforward.

As for "find a range", I think it works because the invocation is rather hand-wavy. Though I have not studied <a>.click() and looking at the various events across implementations to determine if there are web-exposed differences.