aws / jsii

jsii allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase!
https://aws.github.io/jsii
Apache License 2.0
2.62k stars 244 forks source link

JSii blocks the node process when the JS code is not being called by the host #4133

Open TimothyJones opened 1 year ago

TimothyJones commented 1 year ago

Describe the bug

In (at least) Java, the node process is blocked while it is waiting for new method calls.

Specifically, promises that are running in the JSii-ed node process don't run at all when the host process has no active calls to the JSii-ed process.

I have a small repro of this bug in this repository. Here's a breakdown of what's in that repo:

Although this is an async example, the blocking of the node process happens regardless of whether or not the call in to it is async. I only made an async example so that I could make longRunningSomething() take a while so you can see that the ticks run while node is doing something that it is planning to return to java.

Expected Behavior

When the java code at that link is run, I expect the (1) ticks to continue after the call to longRunningSomething() while the Java thread is doing something else.

Current Behavior

When the java code at that link is run, the (1) ticks stop once the JS returns to Java.

The (1) ticks restart when

Reproduction Steps

With JDK 20.0.1 (this is just the version I know works, others probably do too):

git clone git@github.com:TimothyJones/jsii-async-issue.git
cd java-host/untitled
./gradlew run

Observe that the ticks stop printing when Java is waiting

Possible Solution

My guess is that the stdin/out protocol for communicating between JSii and the host uses blocking calls on the JS side. I suspect that non-blocking calls would solve this problem, and possibly several others.

Because JSii communicates across a stdin/out protocol with another process, I think there's quite a bit of flexibility in the way that async is treated - one side could see it as async, while the other sees it as sync (and vice versa).

Additional Information/Context

This is related to this issue in that I think this is the cause, rather than aync hairiness.

(that issue describes the exact situation that I have in my real code, though)

This is a blocking issue for me, I would be happy to look in to it (but definitely would need some guidance on where to look)

SDK version used

jsii@5.1.1

Environment details (OS name and version, etc.)

node 18.13.0, JVM 20.0.1, MacOS 13.3.1 (a)

RomainMuller commented 1 year ago

Alright yes, the current implementation of the host in @jsii/runtime is fully synchronous (bad decisions from a long, long time ago; coming back to haunt).

To a large extent this is because the host needs to maintain a specific order of operations, as the jsii protocol currently is sequential (request -> response to that request). If order of operations is breached, then the response sent could be for a different request than the last one sent by the client, who'd then be unable to match that response to a given request.

I venture that we either need to build some state machines so that we can ensure that a given response is only sent when it's supposed to, so that the client can make sense of it all... or we need to re-design the interop model so that requests have IDs that are passed back in the response, so they could be sent out-of-order.

Following this, we'd also need to make sure that multiple requests/responses don't get simultaneously sent in such a way that the JSON IPC stream would get corrupted (although maybe we're safe with that due to how there can only be 1 JS-executing thread at any given time)

TimothyJones commented 1 year ago

Right, yes. I think if you unblocked node, you could very quickly and naturally end up in a situation where the request responses would be out of order, because an unblocked async might want to call the host (this would definitely happen with my ContractCase project, which passes logs across the boundary so they can be idiomatically handled by the host's test framework).

or we need to re-design the interop model so that requests have IDs that are passed back in the response, so they could be sent out-of-order.

With the caveat that I only have the context from the runtime architecture diagrams (and related documentation), this feels like the more reasonable of the two to me.

we'd also need to make sure that multiple requests/responses don't get simultaneously sent in such a way that the JSON IPC stream would get corrupted (although maybe we're safe with that due to how there can only be 1 JS-executing thread at any given time)

I agree for calls from node -> host, but I think you might need to take care not to multi-thread calls from host -> node (but you probably have that problem regardless of how the protocol works). If it turns out to be a problem, there are a few possible solutions there, I think.

Sooo... this sounds like a substantial rework, and I might not be qualified to do it without more context of how JSii internals work. On the other hand, this issue kills my current approach for multi-platform dead -and JSii is otherwise such a perfect fit for my use-case. If a few hours work over a few PRs could help fix this, I'd like to help out. What do you think the best way to proceed is?

TimothyJones commented 1 year ago

I can take a look at spiking something over the weekend - could you point me at the communication code? I see this interface which looks like it's the place where node -> host happens. Is that right? How does that relate to this module?

Also, presumably the other side of that communication boundary is implemented in each host language? How is the protocol tested? Would a good approach be to modify the node side, then follow breaking tests until it works?

Any code pointers (or general advice) would be super helpful. Also sorry for so many questions!

TimothyJones commented 1 year ago

Actually, there might be a better way with less blast radius - If the node side of the communication boundary was controlled with a mutex, it could be possible to maintain the current contract between the hosts and the node side, but allow node to continue other unrelated promises.

This would stop the core blocking the node thread, but wouldn't allow multi-threaded access across the JSii boundary (which I think is the current state anyway).