PortBlueSky / thread-puddle

A library to pool Node.js worker threads, automatically exposing exported module methods using Proxy Objects. :rocket:
15 stars 3 forks source link

Execute one file per thread #3

Closed markg85 closed 4 years ago

markg85 commented 4 years ago

Hi,

I kinda like your approach to thread pools! It makes it so simple to use existing code, big pros for you there!

What i want to do is let 1 thread handle "a" file. Let another thread handle another file, and so on.

I might also be missing something completely and the feature i'm looking for is well implemented already? :) In that case, please do write a little example of how to use it?

Cheers, Mark

kommander commented 4 years ago

Hey Mark, thanks for your feedback. If I understood you correctly, you can just create two pools with different worker files. Handling multiple files is not directly implemented. I use it like described above, for different worker files, create different pools and call whatever you want in those files.

For cross worker communication I let one worker do a part of a job and hand the result to a different worker pool to be further processed. I don't use any message channels between workers currently.

markg85 commented 4 years ago

Hi,

Thank you for your reply, but your proposed solution is definitely not going to work for me. I'm assuming you've never had a usecase that i have and thus propose that.

My case is that different js files will need to be executed. Those could be thousands. In your suggestion i'd have as many pools as there are different files, that's an absolute no-go. I'd be using basically no pooling functionality, kinda making the point of using a pool moot.

So, i suppose this library isn't for me then. That's fine, there are more out there that will work. I just liked the approach of this one :)

Cheers, Mark

kommander commented 4 years ago

Hey Mark, then I probably understood wrong. What do you mean by "js files need to be executed"? You have features spread over different js files? You can create one worker.js, import your files there and expose the features from the js files via the worker.js. Then you can adjust the pool size to your liking and call whatever method you like which can come from any js file.

Example:

// js-file-1.js
module.exports = () => console.log('feature 1')
// js-file-2.js
module.exports = () => console.log('feature 2')
// worker.js
const feature1 = require('./js-file-1')
const feature2 = require('./js-file-2')

module.exports = {
  feature1,
  feature2
}
// main.js
const { createThreadPool } = require('thread-puddle')
const path = require('path')

async function start () {
  const worker = await createThreadPool(path.resolve(__dirname, 'worker.js'), {
    size: 8
  })

  await worker.feature1()
  await worker.feature2()

  worker.pool.terminate()
}

start()

Like this? Otherwise I really don't get what you mean by "js files being executed".

markg85 commented 4 years ago

Hi,

That is again totally not a solution for my case. Picture this. I have a case where i receive .js files that need to be handled as them flow in. I don't want to handle more then X files at any given moment thus i need to use a pool to limit it.

I cannot put multiple files in one worker as:

  1. i don't know the files that will be send beforehand for obvious reasons.
  2. Each worker needs to execute one file

What i need would looks somewhat like this in code:

  const worker = await createThreadPool(null, {
    size: 8
  })

  await worker.load('file.js').frature()

Something along those lines. Note the first argument in createThreadPool that i removed (null in it's place). And note that i "load a file" in .load('file.js').

kommander commented 4 years ago

Ok, thanks, I kind of understand better now, very interesting use case, thanks for sharing.

While your suggested .load(...) method is interesting, it does not make sense to load different files with different interfaces into the same pool, except using namespaces. If you load a worker, call a method on it once, then throw it away again, it does not make sense to create a pool size larger than 1 anyway

Do the files you get just get loaded and called a method on once and then should be thrown away again after that one call? Then that would be possible as-is right now.

markg85 commented 4 years ago

As i said, i like the approach you did but it might not be suitable for my case after all. The files i get must have a few fixed function to always call. The rest of the file is up to whoever wrote it to add more functionality. A sort of interface if you will.

It is a "call-and-throw-away" yes, but it's not possible in one pool.

kommander commented 4 years ago

It is... checkout https://github.com/kommander/thread-puddle/compare/fire-and-forget-js-file-example

In the fire-and-forget example above you can call your "interface methods which always have to be called in the worker that load and invokes the dynamic module/js file.

EDIT: In this scenario the worker pool and queue can handle back pressure for you, but there currently is no queue limit though, that's something to be aware of.

EDIT2: And as you do not seem to control the modules you load dynamically and execute, you might want to run the modules inside a sandboxed context within the worker, as the incoming files may pollute your global space in the best case, in the worst case expose critical information to other files running afterwards in the worker.

kommander commented 4 years ago

@markg85 You can also use a nested worker to sandbox the user module.

// incoming-file.js
module.exports = {
  interface1: () => console.log('Inteface method one'),
  interface2: () => console.log('Inteface method two'),
  userMethod: (arg1, arg2) => {
    return new Promise((resolve) => {
      setTimeout(() => {
        console.log('Called user module method with', arg1, 'and', arg2)
        resolve()
      }, 100)
    })
  }
}
// worker.js
const { createThreadPool } = require('../../src')

module.exports = {
  fireAndForget: async (filePath, methodName, ...args) => {
    const userModule = await createThreadPool(filePath)

    await userModule.interface1()
    await userModule[methodName](...args)
    await userModule.interface2()

    userModule.pool.terminate()
  }
}

main.js stays the same as above. just calls another methodName.

markg85 commented 4 years ago

I am aware of the risks i'm taking :) Mind you that thread_pool is in itself already a completely separate javascript instance, so in that sense it's "sandboxed". However, it still has access to everything the parent has (the modules, files, ..) so yeah, sandboxing it further might be better. But again, that's for me and entirely outside the scope of this request.

I'm sorry but this level of complexity you're proposing to merely get a thread pool working is just not worth it for me anymore. I'll instead go for a pool that's much easier to get working but a little less friendly on the user side: https://www.npmjs.com/package/node-threads-pool

Thank you for your help en suggestions :)

kommander commented 4 years ago

Node-threads-Pool seems like a legit better solution for your use case. Thanks for the interesting use case excursion.