dojo / util

Dojo 1 - build utilities. Please submit bugs to https://bugs.dojotoolkit.org/
https://dojotoolkit.org/
Other
60 stars 105 forks source link

Use util/build/fs in copy transform #28

Closed gitgrimbo closed 9 years ago

gitgrimbo commented 9 years ago

@wkeese - I've rebased the branch that we commented on (https://github.com/gitgrimbo/util/commit/5a808a42781ba1b4c0a9ae5256b1120a4d313e05) so any subsequent merge is simpler.

Some performance measurements are listed here - https://gist.github.com/gitgrimbo/950f3301ce7b9d8ce18f. From my own testing, using either Node or Java to perform the file copying was significantly faster than using a command-spawning approach (https://github.com/dojo/util/blob/1.10.4/build/transforms/copy.js#L14-L17). The improvement seemed even more marked when there was involvement from anti-virus software.

Ideally there should be some more testing for this, but it seems ok from my own experiments. As such, I have left this feature as optional and activated by a boolean build config property, useFsCopy.

wkeese commented 9 years ago

Hmm, the code looks reasonable, but when I tried a cursory test on my mac (which has node installed), I get errors like below, and then it hangs:

error(324) Error while transforming resource. resource: /ws/trunk/dojo/store/README; transform: 0; error: Error: EMFILE, open '/ws/trunk/dojo/store/README'

Admittedly, I modified the code a little bit to remove the flag and always use your new codepath:

define([
    "../buildControl",
    "../fileUtils",
    "../fs",
    "dojo/has"
], function(bc, fileUtils, fs, has) {

    function copyFileWithFs(src, dest, cb) {
        if (has("is-windows")) {
            src = fileUtils.normalize(src);
            dest = fileUtils.normalize(dest);
        }
        fs.copyFile(src, dest, cb);
    }

    return function(resource, callback) {
        fileUtils.ensureDirectoryByFilename(resource.dest);
        copyFileWithFs(resource.src, resource.dest, function(code, text){
            callback(resource, code);
        });
        return callback;
    };
});

The build command I used (from the util/buildscripts directory) was:

$ ./build.sh action=release profile=profiles/standard.profile.js releaseDir=/ws/trunk/release
gitgrimbo commented 9 years ago

Hmm,

Wonder if it's OSX specific?

I was worried at first that maybe the code needed to close file descriptors and it wasn't doing so, but I don't think explicit closing of file descriptors is required for the readFile/writeFile APIs (which is used for the copy.

This module seems to address some of the relevant issues:

Maybe we need to limit the number of in-flight copying going on for (at least) OSX?

wkeese commented 9 years ago

Interesting. Yes, looks like that's the problem. As we've discussed before, the builder code really goes crazy trying to parallelize things. I had assumed it was copying one file at a time but apparently not. I bet one file at a time would be just as fast.

https://github.com/isaacs/node-graceful-fs looks interesting but I don't think you need it. The builder already has throttling code, and it's used by node/fs.js ((fht.enqueue() and release())), but not by your new function. When I updated your copyFile() method to use the throttling , things started working:

copyFile:function(src, dest, cb){
    // Use no encoding, as the file may be text or binary.
    fht.enqueue(function() {
        fs.readFile(src, undefined, function (err, contents) {
            if (err) {
                fht.release();
                cb(err);
            } else {
                fs.writeFile(dest, contents, undefined, cb);
                fht.release();
            }
        })
    });
},

Unfortunately there's no significant performance increase for me, it only goes from 11s to 10s, but still seems better than before, so I guess I'll push that.

PS: Indeed, for fileHandleThrottle.js, if I set max to 1 or 10, it still takes 10 seconds. Setting max to 100 takes 11 seconds.

wkeese commented 9 years ago

Actually, looking at this more, why does your copyFile() method in node/fs.js use readFile() and writeFile() rather than readFileSync() and writeFileSync()? It seems like copyFile() will return before the copy actually takes place, thus leading to the parallelism problem.

PS: Oh nevermind I guess that's synonymous with the old code, which called process.exec(), which also seems to be asynchronous. (And tellingly also uses fht.throttle().)

wkeese commented 9 years ago

OK, I pushed your code as 2696ddd831096f753ee349b18bca08d109d6dcae, with my changes as 33f18643fc313689e3163ad51e6f6b2346f0e23f. Thanks for working on this! If you want to add the timestamp-checking conditional copy code later we could do that too.