Blockstream / greenlight

Build apps using self-custodial lightning nodes in the cloud
https://blockstream.github.io/greenlight/getting-started/
MIT License
111 stars 28 forks source link

Synchronous call in JS client block events loop #26

Closed rafapaezbas closed 2 years ago

rafapaezbas commented 2 years ago

Hi guys,

during the integration of the Greenlight Javascript client into Holepunch we have observed that calls from the client are blocking the Node.js event loop. That is pretty problematic in the Holepunch platform context.

Non-blocking calls in Node.js look like:

const info = await node.get_info()

You can check if calls block the envent loop by executing:

const interval = setInterval(() => console.log('Running...'), 10) 
node.get_info()
node.list_funds()

If the interval output doesn't print while get_info and list_funds are running, it means the event loop is blocked.

cdecker commented 2 years ago

Not a JS expert, so take this with a grain of salt, but according to the neon documentation it should be possible to turn the calls into promises using promisify though I definitely lack the experience to judge how that works.

Since the rust core actually uses an async reactor anyway, mapping the calls to promises or similar may be the best option, do you have a suggestion on how we could best achieve that?

Fwiw, the calls map to a small JS side serialization to protobuf, passing it to the rust side, from where we spawn a task to send the call and receive the response, so no real thread involved on the rust side at all.

mafintosh commented 2 years ago

If you mean the Node.js core module, util.promisify then that won't help here unfortunately. This is something you have to fix in the bindings.

The problem is that Node.js or any JS runtime run in a single threaded event loop. This means all I/O has be be asynchronous, or you'll block the event loop, meaning the entire app halts to grind (being, effectively unresponsive and broken).

The I/O that the Greenlight bindings is doing in native land is blocking ie. fully synchronous. That's what @rafapaezbas is showing that the interval stops printing cause, all execution grinds to a halt.

This effectively means that you cannot integrate the bindings as is, into any JS application. You'd have to spin up a fully separate JS thread or process just to run Greenlight, which kinda defeats the purpose of the binding in the first place.

The solution is the to use async io in the binding - we'd normally use libuv for that, as nodejs bundles that. I'm unsure how that works with neon, but happy to help guide you through how we do it elsewhere. Alternatively in the bindings you can use a napi_worker which allows you to run it on a thread, but again unfamiliar with how to do that in rust, only done it in c/c++.

Let me know if we can help, as this is pretty, forgive me, ... blocking 🙂

cdecker commented 2 years ago

Hehe, understood. I was hoping that promisify would essentially spawn a worker thread pool onto which it can defer blocking operations, but obviously I didn't dig deep enough. I'll look into how neon expects these things to go, and if that doesn't work I'll have to resort to a future based binding API, where we return a handle that we can poll on based on a timer. That may add a bit of a delay to the results (i.e., add the timer overhead and effectively align results on the poll interval), but it is better than being outright blocking.

cdecker commented 2 years ago

So from the neon docs it appears that we can indeed interact with the event loop with relative ease, by getting a handle on the event queue, it even allows us to perform some operations synchronously in the event loop after the async part is done, which I find a weird level of granularity, but ok :-)

I don't know what the call syntax on the JS side would look like (as it seems to require passing a callback function and there seem to be several syntactic sugar variants that JS implements to flatten the nesting callback hell), so I'll try to find an example and then try to map that to our use case.

FWIW it should be just a single method that we need to make async here (plus a bunch of signature adjustments), since we funnel all grpc calls through the call method :-)

mafintosh commented 2 years ago

Ya, I was looking through your bindings yesterday evening also, and noticed it was all async rust, so this is surely just a "last mile" problem, ie something is "awaiting" your async just into sync code when called from js. The call syntax in js, would be the same, but with an await, ie you just need to return a promise, which the bindings should support (at least part of the node native interface)

cdecker commented 2 years ago

Yep, found an example that I'll reconstruct with a function that sleeps and then we can try if that works. Sounds like I just got nerd-sniped :-)

https://github.com/neon-bindings/neon/issues/560#issuecomment-767620278

cdecker commented 2 years ago

It appears I was able to make sense of the sparse documentation. Seems neon changed syntax several times since that comment was written, and we need to ensure that we don't accidentally break this when upgrading neon in the future.

Still need to change signatures on a number of calls, and extend the async call semantics to the scheduler calls, but it seems to be working:

 RUNS  ./index.test.js
  console.log
    Calling get_info

      at Object.<anonymous> (index.test.js:106:13)

  console.log
    Running...

      at Timeout._onTimeout (index.test.js:103:48)

  console.log
    Running...

      at Timeout._onTimeout (index.test.js:103:48)

  console.log
    Running...

      at Timeout._onTimeout (index.test.js:103:48)

  console.log
    get_info returned

      at Object.<anonymous> (index.test.js:108:13)

  console.log
    GetInfoResponse {
      addresses: [],
      nodeId: <Buffer 02 a4 34 55 31 27 4a 15 11 9c 3d b7 e7 40 97 11 b9 bc 35 89 ae 0f 93 2a 38 9a 2e db b0 a4 96 c7 8a>,
      alias: 'LATENTBEAM-v0.10.1',
      color: <Buffer 02 a4 34>,
      version: 'v0.10.1',
      blockheight: 255093,
      network: 'regtest'
    }

      at Object.<anonymous> (index.test.js:109:13)

This is with setInterval logging every second, and the nodeCall sleeping for 3 seconds, it being interleaved seems like a success to me :-)

I'll publish the state as is now, and extend the coverage as soon as I have a couple minutes :-)

mafintosh commented 2 years ago

nice! is the syntax then just they they return promises?

cdecker commented 2 years ago

One more thing: the streaming calls may need need some more work, but I'm hoping to push this out as soon as possible to get you guys going.

nice! is the syntax then just they they return promises?

Yes, sadly that means that we need to give a heads up to existing users, and point them towards how they should fix up their code (in this case simple: "prepend await to the call"), in order to minimize surprises.

mafintosh commented 2 years ago

Ya, the norm is that you major bump the semver for that, but since it's "beta" you can prob also get away with a patch and a ping :)

cdecker commented 2 years ago

OK, PR is up, feel free to grab it and verify if it works. You guys are certainly in a better position to see if the JS interface and behavior is correct now ^^