Closed tegefaulkes closed 7 months ago
I think this stems from a design decision where data flowing downwards are normal function calls where data flowing upwards would be events. In this case it's making it inconvenient to bubble the send error where we need it.
Consider the following.
The QUICConnection
emits a send event, the QUICClient
OR QUICServer
takes this and passes it along to the QUICSocket
. This is already a little round-about but it breaks any reasonable flow back the the QUICConnection
if there was an error.
The recv path takes a much cleaner flow through normal function calls.
For our needs, it would be much simpler if the QUICConnection
makes the QUICSocket
calls directly, then it can catch the errors it needs. Something like the following.
I want to maintain the design decision for now.
To solve the #78 I think the proper solution is about distinguishing these kinds of errors.
This is still doable with event/push flow.
It's a matter of the the thing catching the exception and then deciding exactly what kind of event to dispatch.
So instead of changing our control flow and data flow design, keep it the same and figure out how to distinguish 2 different event types.
Remember push flow is inherently multi-consumer possible. So QUIC connections are all registered on the the socket events, but one can also register a specific kind of event type - if they all register to the same event type, they can further check the contents of the event to see if it is relevant to them.
Generally it's more efficient to register unique event IDs - but this may not always be possible.
Any design decisions on how events is going to work needs to take into the global consideration of https://github.com/MatrixAI/Polykey/issues/444, and to add commentary there.
We need to make sure our event abstractions and push flow abstraction makes sense in all our work.
If we want to go the event route, then there are two things to do.
QUICConnection
needs to have a reference to QUICSocket
to be aware of any send errors.QUICConnection
needs to add an event handler to QUICSocket
send errors to handle the errors.Since there are many connections to 1 socket, each connection will have to filter for the events only relevant to them. This means every active connection will have to do some processing for every error event filtering for the only one relevant to itself. I'm not sure I like that.
Say we have 100 connections and the network cuts out, Now we have 100 sends failing. Each failed send will have an event being handled by all 100 connections trying to filter for what it wants. So we have 10000 if-checks happening. It's not going to scale very well.
Firstly QUICConnection
already has reference to the QUICSocket
. And there's only 1 socket for all of them.
As for scaling the dispatch, there are ways to do this, lots of ways to achieve it, but the original dataflow/controlflow design should be maintained.
Like I said, it's possible for lots of connections to register specific event IDs, thus enabling O(1) dispatch.
The event names right now correspond to the to the name of the type. But I believe if you go to js-events
you can imagine how it can be architected to support parameterised names.
And this should lead you to understand how that might work seamlessly with observables.
Okay, I'll look into it.
And every importantly, the node specific warnings on event listener counts might need to be entirely eliminated, and instead what we do is manage the total counter of events in a global event bus system that keeps track of it, and can do a total warning if it detects resource leaks at a global level.
EventTarget
is kind of not great to use, I've been thinking we'd just make our own pure TS event class at some point.
That's kind of what AbstractEvent
is atm. I think that js-events
is what needs to do its overrides/abstractions there, but the underlying system should still be EventTarget
to keep it compatible with the rest of the JS ecosystem.
I've got the fix down but just need to review and polish.
I'm thinking I can do a fix for https://github.com/MatrixAI/Polykey-CLI/issues/115 in this PR as well. At minimum I have the groundwork down for it. I just need to work out what errors happen when the internet drops out.
Ultimately the solution was to create a new event for a send error that is coded with the connection ID of the connection so the connection can listen for the error event for it specifically. Right now it only this logic applies mostly to QUICClient
since the EINVAL
error can only happen when specifying bad parameters for new connections. server connections shouldn't
run into this.
I think if we just ignore or do warnings about the following error then we can prevent crashes when the network cuts out.
Error: send ENETUNREACH ::ffff:13.54.214.222:1314
at doSend (node:dgram:716:16)
at defaultTriggerAsyncIdScope (node:internal/async_hooks:463:18)
at afterDns (node:dgram:662:5)
at processTicksAndRejections (node:internal/process/task_queues:83:21) {
errno: -101,
code: 'ENETUNREACH',
syscall: 'send',
address: '::ffff:13.54.214.222',
port: 1314
}
I'm almost done with this. I'm just adding some tests to test network outages. Mainly mocking the udpsocket send to throw the same kind of error I posted in the above comment.
The first test is working fine. But the 2nd where the connection times out when the sends is failing is working mostly. The only problem is the ErrorQUICConnectionIdleTimeout
is being thrown somewhere even after the test ends successfully. Trying to track down the error through all the event handling is tricky since we're loosing basically all useful stack information in the process.
Error handling for push flows is inherently different from pull flows. Errors have to explicitly handled at the handler and possibly flow to a different global error handler. Don't rely on the error stack for push flows.
I'm considering this one done now if you want to look it over.
Description
This PR attempts to fix the problem with socket send errors not being handled properly when they're specific to the connection triggering the send.
Issues Fixed
Tasks
[x] 1. Send errors specific to the connection triggering send should bubble back to that connection to be handled properly.
[x] 2. Starting a connection should fail if the target address is invalid. In this case due to external addresses being invalid for loop back bound sockets.
[x] 3. Temp network failure does not cause a crash but connections eventually time out.
Final checklist
[x] Domain specific tests
[x] Full tests
[x] Updated inline-comment documentation
[x] Lint fixed
[x] Squash and rebased
[x] Sanity check the final build