Closed jgonggrijp closed 2 years ago
Hey there! Sorry you are having a problem here.
This issue is related to the leading **
. I'm not sure why it happens, and I can't easily fix it right now (I don't have a lot of bandwidth for this project any more).
Luckily, the solution / workaround is easy: don't start with **
. Either use /**/...
or /node_modules/...
for your matchers, and you'll have consistent output again. So simply change your file matchers to be:
files: {
file: '/**/jquery/**',
package: 'jquery',
cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}'
}, {
file: '/**/lodash/**',
package: 'lodash',
cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/${filenameMin}'
}, {
file: '/**/backbone/**',
package: 'backbone',
cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}'
}, {
file: '/**/handlebars/**',
package: 'handlebars',
cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/dist/handlebars.runtime.min.js'
}
Though I recommend you use a more specific matcher like /node_modules/...
to prevent accidentally matching something you don't want. In fact, you probably should use /node_modules/jquery/**/*.js
, which ensures you are only matching JavaScript files. This also reduces the burden on the regular expression.
Thanks for the report! I'll leave this open for others to find until I have a chance to fix the issue, but it could be a long time.
Note to self: I think this is directly related to issue #20, which also featured a leading **
pattern.
Thanks for you quick response and suggestions.
At first I didn't want to use the more specific prefixes you're suggesting, because the input html is generated from a template in which the node_modules prefix is taken from a configuration file. The prefix could be anything, for example /node_modules
but also ../node_modules
or vendor
. '**/jquery/**'
is the only glob pattern that matches all scenarios.
Still, I reached a compromise that kind of works. I changed the glob pattern to **/node_modules/jquery/**
. This rules out the vendor
prefix but not /node_modules
or ../node_modules
. This yields correctly quoted replacements with /node_modules
while still giving me incorrect quotes with ../node_modules
.
At this point, I discovered something. If I comment out the second pattern in lib/utils.js
(script tag with src attribute without quotes) and use ../node_modules
as prefix, cdnizer does not replace the links anymore. After having read #20, I understand that both script patterns in utils.js
probably match my script tags (the second yielding a quoted url), but minimatch only matches my glob pattern to the relative prefix if it is quoted (so it matches "../node_modules/jquery/jquery.js"
but not ../node_modules/jquery/jquery.js
. The first quote is lost, so it doesn't reappear in the cdnized html, but the second quote is treated as the last character of the file name. Hence, a src attribute that starts without a quote but ends with a quote.
As it stands, I'm better off using /**/jquery/**
as you suggested. Relative urls don't get replaced, but at least they remain properly quoted and I can work with absolute urls that don't contain node_modules
.
I can think of a workaround for this scenario: after passing the url to minimatch, strip quotes from the ${filename}
and the ${filenameMin}
(they shouldn't be there anyway). You end with an unquoted src attribute while the input was quoted, but at least the html isn't broken. If you want, I can send a pull request that implements this.
I'm not sure I'd be OK always outputting a modification without quotes. The regexes a supposed to preserve the existing quotes as a best as they can. There's clearly some quirks with this library, it's definitely not a "smart" library (as it doesn't parse the HTML at all).
Honestly, I'd hate to see you put in the time, it could be weeks before I have any free time to dig into this further. So I don't want to promise I'll accept or even get time to review any PRs in the near future.
If it puts your mind at ease: the workaround that I'm proposing doesn't always cause cdnizer to strip quotes in the replacement. Rather, it only does so in the pathological case that this issue ticket is about. I think that roughly the following is currently happening.
utils.js
matches my script tag: <script src="../node_modules/jquery/dist/jquery.js"></script>
--> pre = '<script src="', url = '../node_modules/jquery/dist/jquery.js', post = '"></script>'
.**
to relative urls, so no replacement is done.utils.js
matches my script tag, too: pre = '<script src=', url = '"../node_modules/jquery/dist/jquery.js"', post = '></script>'
.**
to paths starting with a quote.url
on the last slash, obtaining jquery.js"
as the ${filename}
and jquery.min.js"
as the ${filenameMin}
.pre + 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' + post
, obtaining <script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
.Now, html attribute values logically shouldn't ever contain quotes, since quotes are reserved for delimiting them. If you find a trailing quote in step 5, then the previous steps must have gone wrong as I described. So you can safely strip that trailing quote.
Another option would be to prevent step 3 from happening. You can stop the pattern from matching quoted urls by adjusting the middle group. Right now, that group is ([\s\S]+?)
, essentially consuming anything until the last group is matched. You could change that into ([^\s"'>]+)
, which I'd argue is more correct. This solution fixes the problem in a strictly more correct way, but prevents replacements in my use case with relative paths and might break some other use cases.
Either way, my pull request would likely affect only one or two lines of code. Reviewing that would take less of your time than this discussion has already taken. I'll wait for your green light before doing anything, though.
I like your solution to fixing the regex, which is definitely a bit too aggressive. Thanks for taking the time to look into this, I'd definitely accept the PR. The only thing left is I'm internally debating if this is a breaking change or not. Sadly, I think it is, even though it's really a bug fix, but someone could, theoretically, be relying on that aggressive regex.
I understand that the fix is simple, but I have to decide whether this would be a breaking change or not, and ensure that there's tests in place, etc. Then I have to deploy it. (It's usually rather quick, but I have a full time job, a family, etc. So time is always at a premium.)
Honestly, I'd rather it take longer to discuss a fix than review it! It means that the problem was better understood, and the solution wasn't rushed. Especially with a library like this, where I only touch it once in a while, so it always takes me some time to remember how everything works.
I'll patiently wait for your internal debate to come to a conclusion.
In the meanwhile: do I understand correctly that you prefer the regex adjustment over the quote stripping?
Oh, sorry, I meant that I'll totally accept a PR for fixing the regex.
The only decision then is whether to deploy it as a breaking (major) change or bugfix (minor change), but I'll probably just roll it out as a breaking change on both repos (cdnizer/gulp-cdnizer) to avoid any headaches.
I think some of the other regexes could be a bit stricter, too. Should I submit fixes for all regexes, given that you'll bring out a major release anyway?
Sure, that would be great, thanks!
Hi there,
has there been a resolution to this issue?
I'm currently using cdnizer
via webpack-s3-plugin
, and this issue is popping up in the generated HTML script/style tags.
Can't apply the workaround, since this is internal to the plug-in.
Will try to workaround by fiddling Webpack's output.publicPath
.
I think I forgot about this. I'll put it back on my to-do list, but I can't currently give any prediction on when I'll be able to submit the PR as agreed.
I just debugged the issue in detail and found that the cause is a bit different from what I thought. In the index.js
, an outer loop goes over all patterns. In the inner loop (contents.replace
), all matches of the pattern are replaced. Since the patterns were not mutually exclusive (at least not until the fix I'm about to submit a PR for), sometimes the same URL is replaced twice with alternative patterns (first with the quoted version and then with the unquoted version). The second iteration, which shouldn't happen, causes the juggling with the quotes.
Closed by #40
Input:
cdnizer call:
Output (note the incoherent presence and absence of quotes):
Output after a second run:
My browser (Safari 11) attempts to load https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js%22, which fails with a 403 status, so the uneven quoting really breaks things.