Open amiller-gh opened 1 year ago
Happy to refactor (may be a day or so), but if I remember correctly, isn't libuv's default thread pool size only 4? And, even with more libuv threads, wouldn't everything buffer while waiting for the event loop to pull the data anyway?
My understanding was that, because of how Node handles IO, the "parallelism" you get from creating hundreds of IO-executing promises has quickly diminishing returns to the point where serializing the work is often comparable perf-wise, and in many cases – where the flood of promise objects quickly inundates the GC, sapping CPU time from the event loop, disk reads compete for system IO access, and libuv buffers begin to grow – serialized reads can frequently result in improved performance.
Genuinely curious if this has improved with recent versions of Node! Happy to update the PR regardless, but would enjoy learning something here if the prevailing wisdom has changed :)
Hi, sorry for cutting in.
We also encountered an EMFILE error like https://github.com/electron/osx-sign/issues/248 and confirmed that the problem got resolved by this PR. As far as I checked, this PR does not likely affect the performance. Although I'm not an expert in nodejs, it would be appreciated if osx-sign project would consider merging this PR.
This will be a performance regression, serialized fs access is Bad Practice ™️
I'd accept a PR that modifies this code to use the
queue
module or something similar that limits our old parallel behavior to a fixed concurrency (E.g. 100)
There is no way an error that causes the osx-sign fail to do the job is better than a performance decrease. If you try to run the script below on an app that have many files, which I copied from src/util.ts
. The EMFILE error will happen, no matter how high you increase ulimit. Beside, performance decrease is not much.
const fs = require('fs-extra');
const path = require('path');
const a= require('isbinaryfile');
async function getFilePathIfBinary (filePath) {
if (await a.isBinaryFile(filePath)) {
return filePath;
}
return null;
}
async function walkAsync(dirPath) {
async function _walkAsync(dirPath) {
const children = await fs.readdir(dirPath);
return await Promise.all(
children.map(async (child) => {
const filePath = path.resolve(dirPath, child);
const stat = await fs.stat(filePath);
if (stat.isFile()) {
switch (path.extname(filePath)) {
case '.cstemp': // Temporary file generated from past codesign
await fs.remove(filePath);
return null;
default:
return await getFilePathIfBinary(filePath);
}
} else if (stat.isDirectory() && !stat.isSymbolicLink()) {
const walkResult = await _walkAsync(filePath);
switch (path.extname(filePath)) {
case '.app': // Application
case '.framework': // Framework
walkResult.push(filePath);
}
return walkResult;
}
return null;
})
);
}
const allPaths = await _walkAsync(dirPath);
return compactFlattenedList(allPaths);
}
walkAsync('./AnExample.app').then((result) => {
console.log(result.length);
}).catch((error) => {
console.log('error', error);
})
Hey! Thanks for the great library – ran into an issue and have a small fix proposal.
I know this has already been raised here: https://github.com/electron/osx-sign/issues/248
However, for #reasons we need to disable ASAR packaging for our product. This means that
walkAsync
has to crawl all ofnode_modules
looking for binary files. BecausegetFilePathIfBinary
actually opens and reads the first few thousand bytes of every file, the recursivePromise.all
quickly throws withEMFILE: too many open files
, no matter how high I set the open file limit viaulimit
.Conveniently, just putting things into a simple for loop fixes things by serializing the file reads, giving Node a chance to release each file before reading the next one. IIRC, this shouldn't impact performance much because of how Node's FS access works. Anecdotally, we also have this change deployed to our CI and it's performing very quickly without any errors!