getify / LABjs

Loading And Blocking JavaScript: On-demand parallel loader for JavaScript with execution order dependencies
http://labjs.com
2.28k stars 313 forks source link

root_domain fails on "unusual" URI-schemes #98

Open derflocki opened 10 years ago

derflocki commented 10 years ago

Want to use $LAB inside a chrome-extension, however the regex that extracts the root_domain does not match the href chrome-extension:mmjeglkkcnglojkjoomifokhekiiiioe/www/index.html. According to RFC3986, Section 3.1:

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-").

so the correct regex would be /^[a-z][\w\+-\/]+\:\/\/\/?[^\/]+/.

getify commented 10 years ago

First, this is somewhat related to #93.

Also, your proposed regex doesn't actually work:

/^[a-z][\w\+-\/]+\:\/\/\/?[^\/]+/.test("chrome-extension:mmjeglkkcnglojkjoomifokhekiiiioe/www/index.html"); // false

We'd have to tweak the regex to make the // part optional, or something like that.

Moreover, the question is: could we actually assume to detect against URLs that don't look like normal web URL's and not break other important behavior (namely absolute/relative URL handling)?

In addition to the root_domain matching with /^\w+\:\/\/\/?[^\/]+/, loaded script URLs are canonicalized to the page by testing against absolute_regex: /^\w+\:\/\//. So, if we're going to make the change to one regex, we have to change both regexes.

But, imagine if we did that, and then someone had this URL: foo:bar/baz.js to load some script, while inside a chrome extension.

Any generic regex that matches your chrome-extension URL would have to also match that foo:bar.. URL. However, that URL might in fact be intended as a relative URL, where they want it to be canonicalized to the root chrome-extension URI, so that it would be loaded as chrome-extension:aosnsldk24rl2ksada/foo:bar/baz.js.

How could we know that this URL should have been taken to be as absolute instead: foo:bar/baz.js, rather than relative?

We couldn't. Right now, "absolute" requires something like http:// at the beginning, which is an easy assumption. But that assumption would totally go away with the proposed change.

Unless, and this is the part I'm dubious on, the regex only allowed a specific safe-list of scheme names, not just a w+ type regex match... like chrome-extension:, about:, etc.

How would we decide which scheme names should be explicitly listed? What happens if someone later comes along and says, "hey, my page is running under this wxyz:.. scheme, can you whitelist it?" And what happens if that accidentally collides with someone else's relative URL design?

Seems like a future maintenance nightmare, that ensures we have to keep updating LABjs rather than keeping it stable and unchanging as much as possible, and that we're asking for breakages.

Not sure what a good general solution to this would be.

My initial reaction is similar to what I said in #93, which is that this is a case where you should tweak LABjs for your own needs, since it's a pretty niche use-case, and the pain (of this one specifically) is disproportionately high.

getify commented 10 years ago

OTOH, perhaps the more appropriate logic is, if the current page is a scheme type, you couldn't do an absolute regex that was also a scheme (unless it was an exact match). You'd have to do a normal web URL like "http://", or otherwise the URL could be reasonably assumed to definitely be relative to the scheme URI.

Even if we could construct such special case logic, I'm not sure if the niche'ness of this deserves to be elevated to the core code.

Undecided for now.

derflocki commented 10 years ago

Thanks for the extremely long explanation! The URL is actually properly formatted chrome-extension://mmjeglkkcnglojkjoomifokhekiiiioe/www/index.html. The slashes got lost in copy and paste, sorry for wasting your time. So i guess the propsed fix should work...

getify commented 10 years ago

Interesting. So the only thing missing is the allowance for - in it?

derflocki commented 10 years ago

Correct. The scheme was the problem. Additionally, as mentioned im my initial post, + and / would also be legal characters according to the RFC.

And since I tend to take these things seriously, the RFC also states that the first char is an ALPHA, thus the [a-z].

What I just now realize, is that the check should actually be case insensitive.

getify commented 10 years ago

OK, this will be queued up to be fixed. But, just to set expectations, we're collecting several small niche things like this together before releasing again. As you can tell, I release quite rarely, as stability is one of the main features of this library.

I would recommend for now that you patch your own copy of LABjs. I'll leave this issue open until such a time as it's appropriate to fix for that next release (probably 2.1).

Thanks!

mttcr commented 8 years ago

Hello, To add some interests to this issue, LAB does not correctly handle the blob: URL scheme used here: https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL This issue corrects that behavior I guess. While waiting for 3.0, I tweaked the regex like that : var absolute_regex = /^(?:\w+\:\/\/|blob\:)/

getify commented 8 years ago

@mttcr thanks! to clarify, you're talking about?

  1. using LABjs to load a blob URL? --OR--
  2. using LABjs inside a page that came from a blob URL?
mttcr commented 8 years ago

The first one. I have some raw source in a string that I convert into a blob, then I load it with LABjs.

let blob = new Blob( [plainSource], [{ type: "text/javascript" }] );
let objectURL = window.URL.createObjectURL( blob );
$LAB
    .script( objectURL )
    .wait(() =>
       {
            window.URL.revokeObjectURL( objectURL );
       });
getify commented 8 years ago

thanks!