browserify / watchify

watch mode for browserify builds
Other
1.79k stars 203 forks source link

Multiple file change events #239

Closed caspervonb closed 8 years ago

caspervonb commented 9 years ago

Watchify seems to be doing something different internally when writing its output stream than other compilers, for example

var chokidar = require('chokidar');
var child = require('child_process');
var fs = require('fs');
var util = require('util');

var watcher = chokidar.watch('{output.js}', {
  alwaysStat: true
});

watcher.on('change', function (filename, stat) {
  console.log('change %s (size: %s)', filename, util.inspect(stat.size));
});

var ps = child.spawn('watchify', ['--outfile', 'output.js', 'index.js', '--verbose']);
ps.stdout.pipe(process.stdout);
ps.stderr.pipe(process.stderr);

setInterval(function() {
  console.log('touch');

  var input = fs.readFileSync('index.js', 'utf-8') + ' ';
  fs.writeFileSync('index.js', input, 'utf-8');
}, 1000);

setTimeout(function() {
  process.exit(0);
}, 10000);
$ node watchify.js 
touch
change output.js (size: 0)
124676 bytes written to output.js (0.04 seconds)
change output.js (size: 124676)

Where-as webpack in with the same use case results in a single change event

var chokidar = require('chokidar');
var child = require('child_process');
var fs = require('fs');
var util = require('util');

var watcher = chokidar.watch('{output.js}', {
  alwaysStat: true
});

watcher.on('change', function (filename, stat) {
  console.log('change %s (size: %s)', filename, util.inspect(stat.size));
});

var ps = child.spawn('webpack', ['--watch', '--output-file', 'output.js', 'index.js']);
ps.stdout.pipe(process.stdout);
ps.stderr.pipe(process.stderr);

setInterval(function() {
  console.log('touch');

  var input = fs.readFileSync('index.js', 'utf-8') + ' ';
  fs.writeFileSync('index.js', input, 'utf-8');
}, 1000);

setTimeout(function() {
  process.exit(0);
}, 10000);
$ node webpack.js

touch
Hash: 0bd844fa25ecb086e1d9
Version: webpack 1.10.0
Time: 21ms
    Asset    Size  Chunks             Chunk Names
output.js  130 kB       0  [emitted]  main
   [0] ./index.js 132 bytes {0} [built]
    + 23 hidden modules
change output.js (size: 129664)

Treating size 0 as an edge case is somewhat trivial, but have also run into cases where change events are emitted 3-4 times when the file is in a fragmented state, babel, coffee and tsc don't seem to exhibit this behaviour either. Have not been able to reproduce this directly with fs.writeFile or fs.createWriteStream either.

Any idea why is watchify behaving so differently from the rest?

zertosh commented 9 years ago

it's because of https://github.com/substack/watchify/blob/3342084/index.js#L132-L137. that should probably act more like a debounce than a simple throttle.

zertosh commented 9 years ago

I haven't tried this, but this should work. Can you let me know?

diff --git a/index.js b/index.js
index 921df02d..a6f789e6 100644
--- a/index.js
+++ b/index.js
@@ -129,12 +129,12 @@ function watchify (b, opts) {
         changingDeps[id] = true

         // wait for the disk/editor to quiet down first:
-        if (!pending) setTimeout(function () {
+        if (pending) clearTimeout(pending);
+        pending = setTimeout(function () {
             pending = false;
             b.emit('update', Object.keys(changingDeps));
             changingDeps = {};
         }, delay);
-        pending = true;
     }

     b.close = function () {
caspervonb commented 9 years ago

Results are somewhat random, sometimes one event sometimes multiple.

change output.js (size: 0)
125113 bytes written to output.js (0.56 seconds)
change output.js (size: 125113)

touch
change output.js (size: 0)
125133 bytes written to output.js (0.04 seconds)
change output.js (size: 125133)

touch
125153 bytes written to output.js (0.03 seconds)
change output.js (size: 125153)

touch
125173 bytes written to output.js (0.03 seconds)
change output.js (size: 125173)

touch
125193 bytes written to output.js (0.03 seconds)
change output.js (size: 125193)

touch
125213 bytes written to output.js (0.05 seconds)
change output.js (size: 125213)

touch
125233 bytes written to output.js (0.04 seconds)
change output.js (size: 125233)

touch
125253 bytes written to output.js (0.06 seconds)
change output.js (size: 125253)

touch
125273 bytes written to output.js (0.04 seconds)
change output.js (size: 125273)

touch
change output.js (size: 0)
125293 bytes written to output.js (0.03 seconds)
change output.js (size: 125293)
caspervonb commented 9 years ago

Removing the delay does work tho so you're right on the culprit.

zertosh commented 9 years ago

cool. when you say "Removing the delay", you mean the diff above or changing the delay value?

caspervonb commented 9 years ago

Neither, the patch had the issues and output posted above (somewhat erratic results). Just ripping it out entirely.

index 921df02..c33cdc7 100644
--- a/index.js
+++ b/index.js
@@ -127,14 +127,10 @@ function watchify (b, opts) {
             delete fwatcherFiles[id];
         }
         changingDeps[id] = true
        // wait for the disk/editor to quiet down first:
-        if (!pending) setTimeout(function () {
-            pending = false;
-            b.emit('update', Object.keys(changingDeps));
-            changingDeps = {};
-        }, delay);
-        pending = true;
+        b.emit('update', Object.keys(changingDeps));
+        changingDeps = {};
     }

     b.close = function () {
@@ -148,4 +144,4 @@ function watchify (b, opts) {
     };

The initial bundle will still emit an empty file, however.

zertosh commented 9 years ago

Can't do that. If more than one file changes at a time (e.g. changing branches), it'll trigger a ton of change events.

caspervonb commented 9 years ago

Not suggesting doing that, just trying to grok what's going on here.

caspervonb commented 9 years ago

Been digging a bit and I can't quite figure out why it's flushing multiple times, a single change should just lead to one (delayed) write?

zertosh commented 9 years ago

I tried your test and I only see the multiple writes on the first "touch" when output.js doesn't exist. I think this is because the file is created with fs. createWriteStream. I think it creates the file first, then it later starts writing to it.

But, your sample output shows that it happens randomly. Very strange.

caspervonb commented 9 years ago

Those results were from Yosemite OS X by the way with node 0.12.x.

Is the content delivered by the .bundle stream buffered? or is it actually generating and pushing out chunks? Seems like it's pushing nothing, followed by chunks of source. Just calling bundle in cmd.js will result in multiple writes

zertosh commented 9 years ago

Ah yes it is. browserify collects everything in the sort step before passing the rows along. All of the expensive bits (like module-deps) happen before that, so there is a significant delay between when you call .bundle and when the first bits are pushed out.

caspervonb commented 9 years ago

Seems like, at-least from the perspective of external processes, the output should be processed and buffered before starting to stream it? (in watchify/cmd.js that is, not library or upstream in browserify).

caspervonb commented 9 years ago

The proper solution I suppose would be to lock, but it's not in node core so I don't see that being widely adopted anytime soon, anyhow what I'm observing from watchify seems to be the following flow.

The time I've observed so far between writes is anywhere between 25ms and 150ms.

For example, in amok's hot patching tests, which runs over babel, coffee, tsc, webpack and watchify, watchify is, so far the only one that has been causing issues and acting differently in terms of the writes (altho likely to have different results with Windows or Linux, as file watching is quite different per OS, I'll test that more in-depth later today)

The first issue, which we've been already discussing here is that there's an empty write, now this one is fairly easy to work around, you can't do anything with 0 bytes anyways and it should probably be treated as a file create anyhow instead of a file change. Working around it by just adding a one second debounce delay, the next event will arrive and it will be like the first one did not occur at all.

The second manifestation is incomplete writes, it's all very inconsistent but sometimes there will only be 444 bytes available when reading, it's rare but it happens.

Those 444 bytes contain the prelude

(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o<r.length;o++)s(r[o]);return s})({

Atomic writes were so much easier to reason about..

Looking at coffee for example, seems the output is already buffered before starting to write (which works great) https://github.com/jashkenas/coffeescript/blob/master/lib/coffee-script/command.js#L495

zertosh commented 9 years ago

The second manifestation is incomplete writes, it's all very inconsistent but sometimes there will only be 444 bytes available when reading, it's rare but it happens.

I can see why that would happen. browser-pack pushes that out first. Then each processed module individually.

Since this issue only seems to be breaking your tests (right?), you can try what watchify's own tests do. Pass in --verbose, and then monitor for log lines. They're printed to stderr, you get one line output per rebuild, and you know the file is completely written out at that point.

EDIT: I know that's a hack - sorry

caspervonb commented 9 years ago

Since this issue only seems to be breaking your tests (right?), you can try what watchify's own tests do. Pass in --verbose, and then monitor for log lines. They're printed to stderr, you get one line output per rebuild, and you know the file is completely written out at that point.

Just of the top of my head, this behaviour also breaks beefy, and it's a bit more 'severe' than that it keeps watchify from being used in a predictable manner with amok at all, my advise has been to use watchify@2.6, I have been for some time since #181.


Really think buffering and only start to write when the entire bundle has been processed is available in memory is the solution, there's no gain to get from appending to a file multiple times, is there?

If there's no objections I'll get around to submitting a pull request soon.

zertosh commented 9 years ago

my advise has been to use watchify@2.6, I have been for some time since #181.

sigh. back in 2.6, a temp file was used, but creating an intermediate file made some ppl unhappy. after watchify started writing directly to the output file, it became possible to use outpipe to save to a shell commands instead. but can't go back now.

there's no gain to get from appending to a file multiple times, is there?

I haven't benchmarked this, but I think there's a perf benefit to writing out as soon as you have any data.

If there's no objections I'll get around to submitting a pull request soon.

I'm actually open to a flag that switches back to the intermediate file approach. something like --use-temp... as I wrote this, I realized you could also use > .temp.js && mv .temp.js output.js as the output "file".

caspervonb commented 9 years ago

Not in favour of atomic writes, fs.createFileStream is fine but not when node's internal buffering is doing several fs.appendFile calls.

My proposal is just wait for wb.bundle() stream to end, then write it to outStream so that there'll be a single call to open and write. Perf loss is somewhat irrelevant as you can't do anything with the written file right now? The time taken to write the file will be about the same anyways, give or take a few microseconds. I'll bench this is needbe.

vvo commented 9 years ago

running into the same issue, browserify + browser sync does not play well because of multiple change events triggered.

Is there a workaround?

caspervonb commented 9 years ago

As previously mentioned, downgrading to 2.6 would work since it used atomic writes (writing to temp and renaming the file).

For 3.x I haven't been able to come up with a viable workaround, first there's a truncate event, then multiple writes at intervals too far apart to be debounced. You could still debounce the writes at a really big interval like 1000ms tho. (Note that adding transforms adds even time between write calls)

vvo commented 9 years ago

Switched to webpack and webpack dev server instead of browserify watchify and browser sync

mattdesl commented 9 years ago

I haven't read this whole issue, but @caspervonb why not use watchify programmatically? It solved a lot of issues for me after switching budo to use the API rather than a child process.

caspervonb commented 9 years ago

@mattdesl Has to work with a range of external preprocessors, as-well as no preprocessor. Server/compiler feature in amok is just convenience for the common cases, should still be able to use your current gulp file and whatnot to setup incremental rebuilding.

zertosh commented 9 years ago

The CLI behavior is always a problem to someone. I'm writing up a "--delay-write" flag to opt into the behavior described in https://github.com/substack/watchify/issues/239#issuecomment-118594264

caspervonb commented 9 years ago

The CLI behavior is always a problem to someone. I'm writing up a "--delay-write" flag to opt into the behavior described in #239 (comment)

While that's something, why not make it default? Watchify is the oddball here, other compilers are buffering before starting to write. Do partial writes have any value?

zertosh commented 9 years ago

Do partial writes have any value?

I think they do. If the output device is slow (e.g. NFS), you're going to want to start writing out as soon as possible, with whatever you have available. This is especially important if you're using sourcemaps since sourcemap creation happens after the modules have been written out. BUT, I could be wrong, and nobody actually cares about this.

caspervonb commented 9 years ago

Implementation detail, With streams3, we could just cork the bundle stream I think?

I think they do. If the output device is slow (e.g. NFS), you're going to want to start writing out as soon as possible, with whatever you have available. This is especially important if you're using sourcemaps since sourcemap creation happens after the modules have been written out. BUT, I could be wrong, and nobody actually cares about this.

Doubt that will even show up in a benchmark, and correctness (the many outweigh the one) should come before nano-optimizations shouldn't it?

Even if thats the case, the amount of time it takes to programatically detect that a new bundle has been written requires a large debounce many factors larger than the intervals we are talking about here. So the user seen performance still seems worse.

zertosh commented 9 years ago

Implementation detail, With streams3, we could just cork the bundle stream I think?

cork would be equivalent to just passing a callback to bundle(). So to be clear, the problem isn't the delay between writes - it's the delay between the creation of the write stream and the first output write.

Using your test, waiting for the bundle stream to emit readable and then creating the output stream seems to work. I'm testing a few more ideas.

zertosh commented 9 years ago

@caspervonb and @vvo can you please try https://github.com/substack/watchify/pull/245 ? it works for me using the test in the description

caspervonb commented 9 years ago

cork would be equivalent to just passing a callback to bundle().

Phrased that as a question because I wasn't not sure about it's semantics as it's "fairly new", yet to use it myself, thought it was 'corking' it until the 'end' event was emitted.

So to be clear, the problem isn't the delay between writes - it's the delay between the creation of the write stream and the first output write.

Both are essentially the same problem, if there's a significant delay between file events it can't reasonably be filtered as a single file change. I'd assume adding transforms, plugins etc adds more delays in the bundle stream at different stages?

@caspervonb and @vvo can you please try #245 ? it works for me using the test in the description

Think we'll need a stricter test case , I'll get around to doing a tape test we can stick into the test suite later today as soon as I get the chance.