electron / osx-sign

Codesign Electron macOS apps
BSD 2-Clause "Simplified" License
558 stars 96 forks source link

fix: Serialize walkAsync so isBinaryFile doesn't throw EMFILE: too many open files. #286

Open amiller-gh opened 1 year ago

amiller-gh commented 1 year ago

Hey! Thanks for the great library – ran into an issue and have a small fix proposal.

I know this has already been raised here: https://github.com/electron/osx-sign/issues/248

However, for #reasons we need to disable ASAR packaging for our product. This means that walkAsync has to crawl all of node_modules looking for binary files. Because getFilePathIfBinary actually opens and reads the first few thousand bytes of every file, the recursive Promise.all quickly throws with EMFILE: too many open files, no matter how high I set the open file limit via ulimit.

Conveniently, just putting things into a simple for loop fixes things by serializing the file reads, giving Node a chance to release each file before reading the next one. IIRC, this shouldn't impact performance much because of how Node's FS access works. Anecdotally, we also have this change deployed to our CI and it's performing very quickly without any errors!

amiller-gh commented 1 year ago

Happy to refactor (may be a day or so), but if I remember correctly, isn't libuv's default thread pool size only 4? And, even with more libuv threads, wouldn't everything buffer while waiting for the event loop to pull the data anyway?

My understanding was that, because of how Node handles IO, the "parallelism" you get from creating hundreds of IO-executing promises has quickly diminishing returns to the point where serializing the work is often comparable perf-wise, and in many cases – where the flood of promise objects quickly inundates the GC, sapping CPU time from the event loop, disk reads compete for system IO access, and libuv buffers begin to grow – serialized reads can frequently result in improved performance.

Genuinely curious if this has improved with recent versions of Node! Happy to update the PR regardless, but would enjoy learning something here if the prevailing wisdom has changed :)

mwakizaka commented 11 months ago

Hi, sorry for cutting in.

We also encountered an EMFILE error like https://github.com/electron/osx-sign/issues/248 and confirmed that the problem got resolved by this PR. As far as I checked, this PR does not likely affect the performance. Although I'm not an expert in nodejs, it would be appreciated if osx-sign project would consider merging this PR.

new-carrot commented 9 months ago

This will be a performance regression, serialized fs access is Bad Practice ™️

I'd accept a PR that modifies this code to use the queue module or something similar that limits our old parallel behavior to a fixed concurrency (E.g. 100)

There is no way an error that causes the osx-sign fail to do the job is better than a performance decrease. If you try to run the script below on an app that have many files, which I copied from src/util.ts . The EMFILE error will happen, no matter how high you increase ulimit. Beside, performance decrease is not much.

const fs = require('fs-extra');
const path = require('path');
const a= require('isbinaryfile');

async function getFilePathIfBinary (filePath) {
  if (await a.isBinaryFile(filePath)) {
    return filePath;
  }
  return null;
}

async function walkAsync(dirPath) {

  async function _walkAsync(dirPath) {
    const children = await fs.readdir(dirPath);
    return await Promise.all(
      children.map(async (child) => {
        const filePath = path.resolve(dirPath, child);

        const stat = await fs.stat(filePath);
        if (stat.isFile()) {
          switch (path.extname(filePath)) {
            case '.cstemp': // Temporary file generated from past codesign
              await fs.remove(filePath);
              return null;
            default:
              return await getFilePathIfBinary(filePath);
          }
        } else if (stat.isDirectory() && !stat.isSymbolicLink()) {
          const walkResult = await _walkAsync(filePath);
          switch (path.extname(filePath)) {
            case '.app': // Application
            case '.framework': // Framework
              walkResult.push(filePath);
          }
          return walkResult;
        }
        return null;
      })
    );
  }

  const allPaths = await _walkAsync(dirPath);
  return compactFlattenedList(allPaths);
}
walkAsync('./AnExample.app').then((result) => {
  console.log(result.length);
}).catch((error) => {
  console.log('error', error);
})