alwinb / spec-url

URL library that implements a reference resolution algorithm for WHATWG URLs
MIT License
5 stars 0 forks source link

Non-special relative handling differs from whatwg-url #6

Closed TimothyGu closed 3 years ago

TimothyGu commented 3 years ago

Given non-special relative URL abc:rootless resolved against abc://host/path, WHATWG uses the input string verbatim. But spec-url tries to actually resolve it:

image

See scheme state step 2.9. It commences the relative URL resolution process having already seen a scheme, but is only taken if it's a special URL.

Same issue happens with abc:/rooted resolved against abc://host/path:

image

TimothyGu commented 3 years ago

A fix appears to be changing the definition of ~goto from:

The non-strict goto operation (url1 ~goto url2) is defined to be (url1 goto url2') where url2' is url2 with its scheme token removed if it case-insensitively compares equal to the scheme token of url1, or url2 otherwise.

to:

The non-strict goto operation (url1 ~goto url2) is defined to be (url1 goto url2') where url2' is url2 with its scheme token removed if it case-insensitively compares equal to the scheme token of url1 and if url2 is a web-URL or file-URL, or url2 otherwise.

alwinb commented 3 years ago

Thank you! Yes, you are right. Apparently there are no tests for that case in the WPT test suite. I will make a change. Fortunately no large changes are needed, it seems, but I'll investigate.

alwinb commented 3 years ago

Hmm I wonder if there is a bit more to say about this. Chrome has the same behaviour, thus disagrees with the WHATWG Standard, whilst Firefox and Safari agree with it. Related issue: https://github.com/whatwg/url/issues/385.

Maybe it is more clean to use the strict goto for generic URLs rather than modify the nonstrict goto. I'll need to have a better look at it.

alwinb commented 3 years ago

Alright. It seems that indeed, using the strict goto for generic URLs solves the problem. I've exposed the strictness as an option for the resolve operations and updated the WHATWGParseResolve function to agree with the spec.

Let me know if this works for you.

I am thinking about removing forceResolve (because the forcing itself is non-strict behaviour, so passing a 'strict' argument may be confusing) and replace it with a WHATWGResolve method that picks the right strictness option (for the goto) depending on its arguments.

TimothyGu commented 3 years ago

I agree it's a little confusing to have a "strict" argument in forceResolve, when it only affects goto.

Why can't we make the ~goto operation a bit more intelligent, and to do nothing for generic-URLs, as I described in https://github.com/alwinb/spec-url/issues/6#issuecomment-855369787? In other words, what are the considerations for threading a parsing mode through all the operations (forced resolution, resolution, pre-resolution) versus just detecting the parsing mode from url2 in non-strict goto?

alwinb commented 3 years ago

Hm I can see that threading through a strictness argument isn't necessarily so elegant. But since you asked, I started thinking about it and then I found out how many implicit guidelines I have been trying to follow.

But there's a general design strategy behind it. There is so much to learn from mathematics when it comes to API design. It's not even much of a stretch to say that mathematics is all about API design. Now, URLs aren't so interesting mathematically, but they're not too bad either. For example,

Trying to stay aware of, or uncover the maths, usually leads to APIs that have properties that are very good for software. And in general the users don't have to know the math, to still benefit from that.

Alright... that's what I'm trying to do anyway. :)

But concretely, yeah, I don't care much about two pre-resolve variants being exposed. And yeah, having a separate strict- and non-strict force-resolve, I think is just too much. I just haven't figured out a more ideal way to do it yet. I have to change force-resolve anyway to collect validation errors, which is on the back of my mind as well. So there'll be more tweaks at some point probably.

alwinb commented 3 years ago

I think I can close this issue. If you run into things, don't hesitate to open a new issue (or reopen this, but I don't know if that option is available).

Thank you!

alwinb commented 3 years ago

As an aside, I'm playing with the idea to set up a Wiki to collect questions/ discussions and closed issues that contain useful information. I don't know if that is a good idea or not, or if it will fo anywhere, but we'll see.

TimothyGu commented 3 years ago

It might be useful to provide some information on general design, such as the rationale of the force operation, etc. The wiki could be a good way of conveying that information.

alwinb commented 3 years ago

There is a little bit of text about that in my specification. It is in the Parsing section, under the subtitle "A note about repeated slashes". There's also an old comment of mine about it here (which however isn't entirely up to date with how things are handled now).

alwinb commented 3 years ago

Meh. I don't like my solution, yours is clearly nicer. I will look at it again.

alwinb commented 3 years ago

I did an update to my characterisation; in 0.7.0 I've removed the non-strict goto, and changed the definition of a base-URL.

The force operation can now be said to coerce file- and web-URLs to base URLs.

Then I specified three resolution operations: strict and non-strict (I think they match RFC3986) and forced-resolution, which characterises WHATWG resolution (modulo normalisation).

Still not ideal, but better, I think.

alwinb commented 3 years ago

@TimothyGu, if you have a some time, would you have ideas or suggestions as to how I can explain the design in a better, and accessible way? I know this is important, but so far I have a hard time with it. (I know, it's not a very concrete question.)