PortBlueSky / thread-puddle

A library to pool Node.js worker threads, automatically exposing exported module methods using Proxy Objects. :rocket:
15 stars 3 forks source link

[enhancement] Managed pool workers #32

Open Pandapip1 opened 1 year ago

Pandapip1 commented 1 year ago

Summary

I have a use case that requires many workers continually running - which this package solves quite nicely.

However, my use case additionally requires the dynamic initialization of new workers (including with per-worker constructor arguments), and the ability to terminate the worker thread both from the worker thread and the main thread.

Proposed API

const worker: Worker = await puddle.createWorker(); // Uses default args
const workerWithCustomArgs: Worker = await puddle.createWorker([1, 2, 3]); // Uses custom args;
// Individually invoke a function of a specific worker
let xyz = await worker.calculateXYZ();
// Terminate worker
await puddle.terminate(workerWithCustomArgs);
kommander commented 1 year ago

Spinning up workers on-demand kind of defeats the purpose of a pool, because the worker startup is quite expensive. I thought about making calling methods on workers directly possible, still in progress. If you really need to startup workers with different initial data, just start a single worker with createThreadPool(workerPath, { workerOptions: { workerData } }) whenever you need it, then you can call something directly on that single worker.

If you have to start up new workers a lot, I would go for a different pattern, like having one worker with a method to initialize something, example:

// worker.ts
class Processor {
  constructor(private customArgs: any) {}

  calculateXYZ(data: any) {  
    // use this.customArgs
  }
}

const processors = new Map<number, processor>()
let processorId = 0

export default {
  init(customArgs: any) {
    processorId += 1
    const processor = new Processor(customArgs)
    processors.set(processorId, processor)
    return processorId
  }

  process(id: number, data: any) {
    return processors.get(id).calculateXYZ(data)
  }
}

Being able to call methods on specific workers in the pool directly would be enough to scale this up then.

Edit: Maybe I need more details about the use case to see what you mean. Is it something like a worker behaving like a class that you can instantiate?

Pandapip1 commented 1 year ago

Is it something like a worker behaving like a class that you can instantiate?

Couldn't have phrased it better myself. That workaround will work for now, though.

The reason why I need multiple threads is that each instance of the class has a bunch of real-time stuff running in the background. Each thread can only handle 2-3 instances before it falls too far behind, and up to 20 instances might be created.