OverZealous / cdnizer

Node module for replacing local links with CDN links, includes fallbacks and customization
MIT License
52 stars 24 forks source link

cdnizer 2.0.0 appears to apply quotes randomly #29

Closed jgonggrijp closed 2 years ago

jgonggrijp commented 6 years ago

Input:

<script src="/node_modules/jquery/dist/jquery.js"></script>
<script src="/node_modules/lodash/lodash.js"></script>
<script src="/node_modules/backbone/backbone.js"></script>
<script src="/node_modules/handlebars/dist/handlebars.runtime.js"></script>

cdnizer call:

var cdnizer = require('gulp-cdnizer');

gulp.src('index.html')
    .pipe(cdnizer({ files: [{
        file: '**/jquery/**',
        package: 'jquery',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' 
    }, {
        file: '**/lodash/**',
        package: 'lodash',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/${filenameMin}'
    }, {
        file: '**/backbone/**',
        package: 'backbone',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}'
    }, {
        file: '**/handlebars/**',
        package: 'handlebars',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/dist/handlebars.runtime.min.js'
    }]}))
    .pipe(gulp.dest('dist'));

Output (note the incoherent presence and absence of quotes):

<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/lodash@4.17.10/lodash.min.js"></script>
<script src=https://cdnjs.cloudflare.com/ajax/libs/backbone/1.3.3/backbone.min.js"></script>
<script src=https://cdn.jsdelivr.net/npm/handlebars@4.0.11/dist/handlebars.runtime.min.js></script>

Output after a second run:

<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src=https://cdn.jsdelivr.net/npm/lodash@4.17.10/lodash.min.js"></script>
<script src=https://cdnjs.cloudflare.com/ajax/libs/backbone/1.3.3/backbone.min.js"></script>
<script src=https://cdn.jsdelivr.net/npm/handlebars@4.0.11/dist/handlebars.runtime.min.js></script>

My browser (Safari 11) attempts to load https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js%22, which fails with a 403 status, so the uneven quoting really breaks things.

OverZealous commented 6 years ago

Hey there! Sorry you are having a problem here.

This issue is related to the leading **. I'm not sure why it happens, and I can't easily fix it right now (I don't have a lot of bandwidth for this project any more).

Luckily, the solution / workaround is easy: don't start with **. Either use /**/... or /node_modules/... for your matchers, and you'll have consistent output again. So simply change your file matchers to be:

files: {
        file: '/**/jquery/**',
        package: 'jquery',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' 
    }, {
        file: '/**/lodash/**',
        package: 'lodash',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/${filenameMin}'
    }, {
        file: '/**/backbone/**',
        package: 'backbone',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}'
    }, {
        file: '/**/handlebars/**',
        package: 'handlebars',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/dist/handlebars.runtime.min.js'
    }

Though I recommend you use a more specific matcher like /node_modules/... to prevent accidentally matching something you don't want. In fact, you probably should use /node_modules/jquery/**/*.js, which ensures you are only matching JavaScript files. This also reduces the burden on the regular expression.

Thanks for the report! I'll leave this open for others to find until I have a chance to fix the issue, but it could be a long time.

OverZealous commented 6 years ago

Note to self: I think this is directly related to issue #20, which also featured a leading ** pattern.

jgonggrijp commented 6 years ago

Thanks for you quick response and suggestions.

At first I didn't want to use the more specific prefixes you're suggesting, because the input html is generated from a template in which the node_modules prefix is taken from a configuration file. The prefix could be anything, for example /node_modules but also ../node_modules or vendor. '**/jquery/**' is the only glob pattern that matches all scenarios.

Still, I reached a compromise that kind of works. I changed the glob pattern to **/node_modules/jquery/**. This rules out the vendor prefix but not /node_modules or ../node_modules. This yields correctly quoted replacements with /node_modules while still giving me incorrect quotes with ../node_modules.

At this point, I discovered something. If I comment out the second pattern in lib/utils.js (script tag with src attribute without quotes) and use ../node_modules as prefix, cdnizer does not replace the links anymore. After having read #20, I understand that both script patterns in utils.js probably match my script tags (the second yielding a quoted url), but minimatch only matches my glob pattern to the relative prefix if it is quoted (so it matches "../node_modules/jquery/jquery.js" but not ../node_modules/jquery/jquery.js. The first quote is lost, so it doesn't reappear in the cdnized html, but the second quote is treated as the last character of the file name. Hence, a src attribute that starts without a quote but ends with a quote.

As it stands, I'm better off using /**/jquery/** as you suggested. Relative urls don't get replaced, but at least they remain properly quoted and I can work with absolute urls that don't contain node_modules.

I can think of a workaround for this scenario: after passing the url to minimatch, strip quotes from the ${filename} and the ${filenameMin} (they shouldn't be there anyway). You end with an unquoted src attribute while the input was quoted, but at least the html isn't broken. If you want, I can send a pull request that implements this.

OverZealous commented 6 years ago

I'm not sure I'd be OK always outputting a modification without quotes. The regexes a supposed to preserve the existing quotes as a best as they can. There's clearly some quirks with this library, it's definitely not a "smart" library (as it doesn't parse the HTML at all).

Honestly, I'd hate to see you put in the time, it could be weeks before I have any free time to dig into this further. So I don't want to promise I'll accept or even get time to review any PRs in the near future.

jgonggrijp commented 6 years ago

If it puts your mind at ease: the workaround that I'm proposing doesn't always cause cdnizer to strip quotes in the replacement. Rather, it only does so in the pathological case that this issue ticket is about. I think that roughly the following is currently happening.

  1. The first pattern in utils.js matches my script tag: <script src="../node_modules/jquery/dist/jquery.js"></script> --> pre = '<script src="', url = '../node_modules/jquery/dist/jquery.js', post = '"></script>'.
  2. Minimatch doesn't match globs starting with ** to relative urls, so no replacement is done.
  3. The second pattern in utils.js matches my script tag, too: pre = '<script src=', url = '"../node_modules/jquery/dist/jquery.js"', post = '></script>'.
  4. For some mysterious reason, minimatch does match globs starting with ** to paths starting with a quote.
  5. Cdnizer splits the url on the last slash, obtaining jquery.js" as the ${filename} and jquery.min.js" as the ${filenameMin}.
  6. Cdnizer replaces pre + 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' + post, obtaining <script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>.

Now, html attribute values logically shouldn't ever contain quotes, since quotes are reserved for delimiting them. If you find a trailing quote in step 5, then the previous steps must have gone wrong as I described. So you can safely strip that trailing quote.

Another option would be to prevent step 3 from happening. You can stop the pattern from matching quoted urls by adjusting the middle group. Right now, that group is ([\s\S]+?), essentially consuming anything until the last group is matched. You could change that into ([^\s"'>]+), which I'd argue is more correct. This solution fixes the problem in a strictly more correct way, but prevents replacements in my use case with relative paths and might break some other use cases.

Either way, my pull request would likely affect only one or two lines of code. Reviewing that would take less of your time than this discussion has already taken. I'll wait for your green light before doing anything, though.

OverZealous commented 6 years ago

I like your solution to fixing the regex, which is definitely a bit too aggressive. Thanks for taking the time to look into this, I'd definitely accept the PR. The only thing left is I'm internally debating if this is a breaking change or not. Sadly, I think it is, even though it's really a bug fix, but someone could, theoretically, be relying on that aggressive regex.

I understand that the fix is simple, but I have to decide whether this would be a breaking change or not, and ensure that there's tests in place, etc. Then I have to deploy it. (It's usually rather quick, but I have a full time job, a family, etc. So time is always at a premium.)

Honestly, I'd rather it take longer to discuss a fix than review it! It means that the problem was better understood, and the solution wasn't rushed. Especially with a library like this, where I only touch it once in a while, so it always takes me some time to remember how everything works.

jgonggrijp commented 6 years ago

I'll patiently wait for your internal debate to come to a conclusion.

In the meanwhile: do I understand correctly that you prefer the regex adjustment over the quote stripping?

OverZealous commented 6 years ago

Oh, sorry, I meant that I'll totally accept a PR for fixing the regex.

The only decision then is whether to deploy it as a breaking (major) change or bugfix (minor change), but I'll probably just roll it out as a breaking change on both repos (cdnizer/gulp-cdnizer) to avoid any headaches.

jgonggrijp commented 6 years ago

I think some of the other regexes could be a bit stricter, too. Should I submit fixes for all regexes, given that you'll bring out a major release anyway?

OverZealous commented 6 years ago

Sure, that would be great, thanks!

nemoDreamer commented 2 years ago

Hi there, has there been a resolution to this issue? I'm currently using cdnizer via webpack-s3-plugin, and this issue is popping up in the generated HTML script/style tags. Can't apply the workaround, since this is internal to the plug-in. Will try to workaround by fiddling Webpack's output.publicPath.

jgonggrijp commented 2 years ago

I think I forgot about this. I'll put it back on my to-do list, but I can't currently give any prediction on when I'll be able to submit the PR as agreed.

jgonggrijp commented 2 years ago

I just debugged the issue in detail and found that the cause is a bit different from what I thought. In the index.js, an outer loop goes over all patterns. In the inner loop (contents.replace), all matches of the pattern are replaced. Since the patterns were not mutually exclusive (at least not until the fix I'm about to submit a PR for), sometimes the same URL is replaced twice with alternative patterns (first with the quoted version and then with the unquoted version). The second iteration, which shouldn't happen, causes the juggling with the quotes.

OverZealous commented 2 years ago

Closed by #40