alwinb / url-specification

A rephrasing and generalisation of the WHATWG URL Standard
7 stars 0 forks source link

theoretical issue with caseless comparissons #11

Open ghost opened 3 years ago

ghost commented 3 years ago

It seems the spec resorts to lowercasing given strings in order to compare them ASCII case‐insensitively, but I don’t think that’s quite right.

Even though the strings being compared against are ASCII‐only, the given strings (input) are not, and that leaves ambiguity about how to handle the lowercasing for non‐ASCII code points.

Note that this is only a theoretical issue, since standard Unicode case mapping never maps to an ASCII‐only string when lowercasing (it does when uppercasing, though, e.g. uppercase "wß:" = "WSS:"), but I think Unicode doesn’t enforce case mapping strictly, so implementations might choose to implement locale‐specific mappings that may, so e.g. locale-lowercase "FİLE:" might equal "file:" (though I don’t think it does in many localized mapping implementations, if any).

I think a more succinct approach could be to establish an ASCII case‐insensitive comparison equivalence relation with an operator associated with it and use it instead.

alwinb commented 3 years ago

I know, this is one of these things where I just didn't bother to be more exact. Yet, at least. It's quite nice that people then notice.