MatrixAI / js-quic

QUIC Networking for TypeScript & JavaScript
https://matrixai.github.io/js-quic/
Apache License 2.0
13 stars 1 forks source link

`js-quic` hard crashes when bound to loopback and creating connection to an external address. #78

Closed tegefaulkes closed 8 months ago

tegefaulkes commented 10 months ago

Describe the bug

There have been test failures in the intergration:docker ci job for the Polykey-cli. Exploring the problem locally we found that under the configuration and parameters used for the tests was causing the QUICSocket._send to error out with EINVAL.

From what we can tell the the PolykeyAgent is starting while bound to a local address 127.0.0.1 but during the course of the testing connections are being attempted to the external testnet with an address such as 3.139.146.137. SO specifically the combination of host networking in docker, binding to the loopback and trying to connect externally is causing an EINVAL error in QUICSocket._send. Which bubbles up to the top of the process and crashes the program.

The following is the console.error output of the error after running into the problem. We obtained this by adding the console.error() to the handlers for uncaught exceptions.

WARN:start test:INFO:polykey.PolykeyAgent.NodeConnectionManager.NodeConnection [3.139.146.137:1314]:Creating NodeConnection
INFO:polykey.PolykeyAgent.NodeConnectionManager.NodeConnection [3.139.146.137:1314].QUICClient:Create QUICClient to 3.139.146.137:1314

WARN:start test:INFO:polykey.PolykeyAgent.NodeConnectionManager.NodeConnection [3.139.146.137:1314].QUICClient.QUICConnection 8bb399756f3559b0e30b91a116410864d0721d16:Connect QUICConnection

WARN:start test:INFO:polykey.PolykeyAgent.NodeConnectionManager.NodeConnection [3.139.146.137:1314].QUICClient.QUICConnection 8bb399756f3559b0e30b91a116410864d0721d16:Start QUICConnection

WARN:start test:INFO:polykey.PolykeyAgent.NodeConnectionManager.NodeConnection [3.139.146.137:1314].QUICClient:ErrorQUICClientInternal: QUIC Client internal error - Failed to send data on the QUICSocket

WARN:start test:ErrorQUICClientInternal: Failed to send data on the QUICSocket
    at u.handleEventQUICConnectionSend (/nix/store/kvp1pwr7j45llp40mcjkn62qn7fgx0jj-polykey-cli-0.1.1/lib/node_modules/polykey-cli/dist/polykey.js:2124:74384)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async EventTarget.handler (/nix/store/kvp1pwr7j45llp40mcjkn62qn7fgx0jj-polykey-cli-0.1.1/lib/node_modules/polykey-cli/dist/polykey.js:7:6011) {
  data: {
    msg: <Buffer c2 00 00 00 01 10 3a 8d c9 99 ac f4 08 4e 52 70 39 65 27 71 f7 03 14 8b b3 99 75 6f 35 59 b0 e3 0b 91 a1 16 41 08 64 d0 72 1d 16 00 41 12 b6 f1 66 fe ... 1150 more bytes>,
    port: 1314,
    address: '3.139.146.137'
  },
  cause: Error: send EINVAL 3.139.146.137:1314
      at doSend (node:dgram:716:16)
      at defaultTriggerAsyncIdScope (node:internal/async_hooks:463:18)
      at afterDns (node:dgram:662:5)
      at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
    errno: -22,
    code: 'EINVAL',
    syscall: 'send',
    address: '3.139.146.137',
    port: 1314
  },
  timestamp: 2023-12-08T02:15:24.164Z
}
ErrorQUICClientInternal: Failed to send data on the QUICSocket

I suspect this may be platform specific to docker since this problem never occurs during normal testing.

To Reproduce

  1. In polykeyCLI repo you need to build the docker image using the normal methods shown in the README.MD.
  2. Run the following docker command docker run --network host -it $image agent start --network testnet -np /tmp --verbose --agent-host 127.0.0.1
  3. The program will throw a ErrorQUICClientInternal error and crash.

Expected behaviour

Even when bound to a loopback, connection attempts to external addresses shouldn't crash. These should be errors specific to the connection being made. In fact any EINVAL errors should be thrown back to the connection in some fashion and never be taken as a full failure of the socket.

Platform (please complete the following information)

Additional context

Notify maintainers

CMCDragonkai commented 10 months ago

This needs to be tested across platforms if the behaviour is consistent.

But it should also be treated as a per-connection error.

tegefaulkes commented 9 months ago

So basically the problem is that we're conflating problems with the socket causing errors with problems with send causing errors. The main one we ran up against is invalid input to send causing an EINVAL error since the address was invalid for how the socket was bound.

The solution here is to take any error coming out of the send and checking if it's a problem specific to the connection or packet send attempt, or some grander problem with the socket itself. If the problem is specific to only whatever was using the send at the time then we need to pass the error along to that.

This leads on to further work where we make js-quic tolerant of any network dropouts because I noticed when unplugging the Ethernet cable the node crashes. Actually, I'll make a bug report for that.

CMCDragonkai commented 9 months ago

The solution here is to take any error coming out of the send and checking if it's a problem specific to the connection or packet send attempt, or some grander problem with the socket itself. If the problem is specific to only whatever was using the send at the time then we need to pass the error along to that.

Yes this is needed, but not by changing the dataflow directions. This is best done with a combination of 2 things:

  1. Error types
  2. Event types

That way it's possible to throw up the errors/dispatching the right events, and have the relevant consumers decide whether it is relevant to them. Even connection IDs can be part of these events, so that a particular QUICConnection can check if it is relevant to themselves.

CMCDragonkai commented 9 months ago

Remember to assign issues to yourself if you're working on them.

tegefaulkes commented 8 months ago

Re-opening this. There were some problems with the CI after merging #86.

I'll be applying fixes to the same branch feature-socket-errors