Closed ncallaway closed 4 years ago
I've found a fix that appears to resolve that issue:
In the connection manager handling inbound packets:
If we add disregard the comment and add the connections to the list, this behavior is largely resolved. We dispatch a Connection
event on receiving an inbound message, and start sending outbound heartbeats because the connection is added to the list.
conn.process_packet(messenger, payload, time);
+ self.connections.insert(address, conn);
}
My main concern is that doing this explicitly contradicts the comment in that section. Before I make a PR I want to do some git history digging and exploration and try to find where that came from, but it seems very intentional that we weren't adding the connection to the list.
If @TimonPost or anyone else knows about the history of that off the top of your heads, I'd be interested in hearing it. It makes sense to me to start a connection on either receiving an inbound packet or sending an outbound packet, but if there are reasons to the contrary they should be considered.
It looks like this was introduced in this PR: https://github.com/amethyst/laminar/pull/258. Given that was a large refactor, I'm somewhat inclined to think the decision not to include the connections on receive was a minor mistake as part of what looks like a really awesome refactor.
@fraillt I know it's been almost a year since that refactor so you might not remember, but is there any chance you remember why you decided to create the connection but not yet add it to the connections list?
Hello, I don't remember exact events that lead to this, but it's certainly intensional. I remember having a discussion on discord and also created issue #242. The main idea behind this was that connection should only be accepted when a server actually responds to it. If you noticed it, the only place where a connection is added is when handling events from server. Later, followed a discussion, that it would be nice to have proper connection management. @TimonPost was actually working on DTLS (don't know what state it's in now), and I started working on https://gafferongames.com/post/client_server_connection/, as it was perceived a nice to have addition while DTLS is complete. Although I had almost completed it, I wasn't happy with it as it looked that it could be simplified a lot, I just didn't know how :) then, I had some personal issues and I never came back to it...
TLDR; It is intensional and it was always like that, even before refactoring. The main idea behind this was that connection should only be accepted when a server actually responds to it.
@fraillt is correct. There was an issue that noted a vulnerability for a vector attack that floods the memory in the protocol. In order to fix this, we have implemented a temporary workaround to only add the connection if the server responds with a message to the client. This behavior allows for simple connection management.
I want to remove this behavior and implement a more mature way of connection management. But while doing so I came to the conclusion the ecosystem isn't far enough yet and it would have cost me too much effort to implement handshaking my self. I wanted to use DTLS of this and that is why I wrote https://github.com/TimonPost/udp-dtls. However, this only supports one socket to one socket and not one socket with many sockets which is needed for server -> client connection management.
There are ways to implement this with openssl however those edits were needed: https://github.com/TimonPost/udp-dtls/issues/2. As the issue described lots of edits were needed to get this working and we would be stuck with a C dependency to opensll. And the only rust alternative would be Rustls, but this library doesn't support DTLS and is waiting for the draft of DTLS 1.3 to be completed.
So I would be left with the option or implement some kind of handshaking my self or implement DTLS 1.3 for Rustls. Because of my new year resolutions and lack of knowledge in this security aspect, I dropped the project.
That makes sense. I'll close the PR. I agree that implementing an proper handshake is a better resolution to this issue.
I'll start researching how I can help on that front, or if we just should cool our heels for the DTLS 1.3 draft.
In the meantime, I think this is still a pretty significant issue for laminar users. This is mentioned on the important notices page, but I think some users are still missing this fact.
I think there are two or three things we could do to help new laminar users have the right expectations around this behavior:
Normalize SocketEvent::Connection events to only occur when the connection is actually added to the list. This will both prevent the repeated connections behavior described here, but would also force users to discover this behavior earlier (since most users probably interact with the connection
event early).
This is the main thing that tripped me up, because even though I had read the notice about connections and knew I needed to pong from the server, I interpreted the connection event as meaning heartbeats would be flowing.
Add documentation of this behavior in key places (like SocketEvent::Connection). This way, if a user expects a connection to happen on just a packet receipt will be confused about why a connection event isn't firing and hopefully check the Connection documentation.
I feel less certain about this one, but we might also change the requirement to establish a connection to both the send and receipt of a packet. This change would be more to simplify the mental model for a user of a connection: A connection is an endpoint to which you have both sent and received a Packet from within the timeout window. It makes the concept of a connection feel more symmetric between client and server.
I'll start exploring the first two as a resolution to this issue, and we can have a conversation around the third item.
Awesome! Thanks for your contributions already. It would be amazing to finish this protocol for it to be working fluently.
In the meantime, I think this is still a pretty significant issue for laminar users. This is mentioned on the important notices page, but I think some users are still missing this fact.
I fully agree with you. I would maybe even say let's remove this whole 'pong' idea and go for 3. I wonder whether the risk (as described above) is greater then the struggles it gives us. It is a quick fix that has been here for 1 year and has caused more confusion than that it solved. What do you think?
Re: implementing DTLS, I tried it, but it requires much knowledge about TLS, SSL, handshaking, and security. For me personally, I don't have that knowledge that's why I took my hands back such that I would not burn them. But if you feel comfortable in those fields and can help RustLS that would be amazing. There was an existing pull request for DTLS 1.2. I even rewrote that one to be compatible with the newest Rustls version. Because that pull request dates back 2 years ago. Lots of things have changed since that time. My pull request would be a good start.
I definitely don't think I'm to to implementing DTLS on my own without guidance. I think on that front, we should probably wait for DTLS 1.3 and rustls
to work on it. I think I'd be comfortable helping implement DTLS 1.3 with guidance from the author of rustls
, but that'll need to wait until he's ready to work on it.
What I was thinking in the meantime is that laminar
could implement a custom "laminar handshake" that might be something like a very-simplified toy version of DTLS, with the plan of migrating the "laminar handshake" to DTLS 1.3 when rustls
makes it available. The custom handshake could include a basic mitigation against flooding attacks.
I think tackling something like the custom handshake (with some advice on how to integrate it into the existing laminar structure) would be pretty feasible for me, but might take a bit of planning.
I fully agree with you. I would maybe even say let's remove this whole 'pong' idea and go for 3. I wonder whether the risk (as described above) is greater then the struggles it gives us. It is a quick fix that has been here for 1 year and has caused more confusion than that it solved. What do you think?
Alternatively, if others agree that we're willing to sacrifice some flooding safety here for this, we could come back to this pull request.
We could add a little bit more safety to this PR by doing something like counting the number of unacknowledged
connections (where the calling code hasn't ever sent an outbound packet), and limit those connections to some fixed size (say 15
or something). If we are already maxed out on unacknowledged
connections, then incoming packets would not be added to the connections map (the same behavior as now).
This would require adding one new bool
field to the VirtualConnections
, and a new u8
field to the ConnectionManager
.
I think this would give us back some basic flooding safety, by putting a hard cap on the amount of memory UDP flooding could consume. This second approach would be pretty quick for me to implement.
I keep throwing out options, but you also asked my opinion so I'll give that too.
I think the easiest path forward (outlined more specifically below) would put us on a good path to implementing a stub-DTLS-like-handshake, which would put us on a good path to implementing DTLS 1.3 whenever it's available.
So I'd recommend the following path:
rustls
My only concern with this approach is the amount of churn it might add to the laminar
interface and protocol. I know we're pre-1.0, but I'm not sure how many times we want to change behavior around Connections. If we want to minimize churn, then I think we would rather just go straight to the stub-DTLS-like-handshake.
Simple Connection changes
Connection
(as far as the events / interface / documentation is concerned) as both a send and a receive of a user Packet
VirtualConnection
on either a send or receive.VirtualConnection
to track whether it has ever had an inbound Packet
and an outbound Packet
VirtualConnections
to some fixed (configurable) size.VirtualConnection
that has both sent and received a packet.This behavior would still require a laminar
user to respond to each inbound message in order for a Connection
event to be triggered. The SocketEvent::Connect
events would be symmetric between the client and the server (e.g. they both dispatch a Connect
event once the server respond's with a message. Additionally, heartbeats would match the SocketEvent::Connect
dispatch.
I like your proposal for preventing flooding attacks while removing the current behavior that causes problems. It seems to me that this change is a nice intermediate solution between removing and keeping the functionality. That said you mention " but I'm not sure how many times we want to change behavior around Connections", With your proposal, we would not break any user code because their handshake would maybe be unnecessarily but still useful to mark the connection as 'trusted'.
I would suggest not to go for a stub-DTLS-like-handshake protocol. For the following reasons:
Great. I'll start on that over the weekend. Shouldn't take long
This should be fixed by now
Heartbeat messages do not begin sending if a Packet is received, they will only begin sending after an outbound Packet is sent. This causes two issues:
1) The opposite side of the connection will perceive this as a Timeout event and terminate the connection (since it doesn't receive any heartbeat messages).
2) This side of the connection will see each received heartbeat as a new connection event, which leads to repeated Connection events.
An example of this can be found here: https://github.com/ncallaway/laminar-heartbeat-demo
This creates a server (:12350) and a client (:12355). Each are configured to send heartbeats every 2 seconds and timeout after 5 seconds.
The client, on startup will send a
Ping
to the server. The server can optionally respond with apong
message.What we see is that if the server responds with the
pong
message, heartbeats flow in both direction and the connection is maintained on both ends.If the server does not respond with a pong message, heartbeats only flow from the client to the server, until the client times out. The server sends a new
Connection
event for each heartbeat that it receives, and never reports a timeout event after the client stops sending heartbeat messages.Ping
Wireshark Capture
Server
Client
Ping w/ Server Pong
Wireshark Capture
Server
Client