andywer / threads.js

🧵 Make web workers & worker threads as simple as a function call.
https://threads.js.org/
MIT License
3.03k stars 161 forks source link

Consider experimental support for shared memory #272

Open ivan-aksamentov opened 4 years ago

ivan-aksamentov commented 4 years ago

Hi Andy @andywer ,

We are having a great success using your library and it unlocks all kinds of new possibilities for client-side compute, and, hopefully, will serve some COVID-19 researchers soon: https://github.com/neherlab/webclades

One problem we faced is queue being on main thread and the pool being blocked from retrieving new tasks when main thread is busy (in our case with some heavy rendering): this leads to underutilization of resources.

I touched this problem a bit in this issue: https://github.com/neherlab/webclades/issues/38

There is an experimental work happening in Mozilla that allows for shared memory https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer/Planned_changes

Do you think it can it be used to put queue in the shared segment of memory and to share it between workers, and execute dequeuing in the context of the worker itself, so that they don't have to tap the main thread?

I don't expect this is happening right now of course, especially that this particular API is not even released, but would be great to have this implemented one day.

Thanks again for the great library!

andywer commented 4 years ago

Hey @ivan-aksamentov, that's so nice to hear!

Always happy to help put technology to good use 🙂

Regarding the feature request… Good point. Didn't think that the pool on the main thread would become a bottleneck easily, but now as I think about it, it makes sense.

I think we need to split the discussion into two here: a) Moving the pool off the main thread b) Shared memory blobs

I think (a) could be quite straight forward if we go for the "cheap solution", esp. now that #273 is about to land: One could manage the pool in a worker thread.

Not as elegant and maybe not 100% as efficiently as having a "decentralized pool" managed by the pool workers themselves, but it might suffice to end the resource fight between UI and pool scheduler. Running a pool on a worker might already be possible, in fact.

Now there's the question of (b) shared memory. The biggest issue with it is x-platform support, though. Browser support for shared memory buffers looks bad and for node.js we would need to get pretty creative… So that seems to be pretty much a deal-breaker at the moment.

With the callback support (#273) it might be worthwhile just giving the pool on a worker a shot and see how it performs.

ivan-aksamentov commented 4 years ago

@andywer

a) Moving the pool off the main thread

Oh, I haven't thought about it. A dedicated worker, whose job to only distribute tasks and then idle, might work indeed. Will be waiting for the news in this area.

Thanks again for the great library!

andywer commented 4 years ago

@ivan-aksamentov I published the current state of the callback PR as threads@1.6.3-callbacks in case you want to try it. You can find some very basic documentation how to use it in this comment.

ivan-aksamentov commented 4 years ago

@andywer Thanks. I am not yet fully understand how exactly callbacks help with moving the pool off main thread, but I may poke around sometimes.

I assumed it will be an implementation detail of the pool itself. Or can we hack something together in userland?

Can you give me some pointers?

andywer commented 4 years ago

Sure. I think it should be feasible in userland. Haven't tried it yet, but in my head it looks something like this:

// worker.js
import { expose, Pool } from "threads"

const pool = new Pool()
const tasks = new Map()

expose({
  completed() {
    return pool.completed()
  },
  queueFooTask(workDescription) {
    const task = pool.queue(async worker => {
      return worker.doFooTask(workDescription)
    })
    tasks.set(task.id, task)
    return {
      id: task.id
    }
  },
  awaitTaskCompletion(exposedTask) {
    // Pool tasks are `.then()`-able (resemble promises), so we might just be able to simply return the task
    // in order to return a completion promise to the calling thread
    return tasks.get(exposedTask.id)
  },
  cancelTask(exposedTask) {
    tasks.get(exposedTask.id).cancel()
  }
})

Just realized that you might not even need callbacks, but callbacks themselves are also not enough to be able to provide a really nice API – you would need to be able to return objects with callable methods.

Might make sense to extend the PR with such a feature as right now you can only pass callbacks directly, but you cannot pass or return an object with callback methods.