Haivision / srt

Secure, Reliable, Transport
https://www.srtalliance.org
Mozilla Public License 2.0
3.06k stars 840 forks source link

[maint] Fix handling of address reusage when binding #1513

Open ethouris opened 4 years ago

ethouris commented 4 years ago

There's indeed something wrong around the SRTO_REUSEADDR option. According to what I can see in the CUDTUnited::updateMux function that is responsible for associating the Multiplexer with the socket, the search through an existing multiplexer for the given address is being done only if the socket being bound has set SRTO_REUSEADDR option, and this option is also replicated inside the multiplexer, so only "reusable" multiplexers are allowed to be associated with such a socket.

In effect, if you have a socket that has SRTO_REUSEADDR set to false, or an existing multiplexer for that address is not reusable (that is, this address was earlier bound to a socket that has set SRTO_REUSEADDR to false), then it always tries to create a new multiplexer. There are two problems here, however:

  1. Not handled explicitly a situation when an attempt to create this multiplexer, associated with creating a new socket and binding it to that address, would fail, and moreover, we know that this would fail from upside, if we already found a multiplexer that is bound to this address and it is not reusable. In such a situation, the attempt to create a new multiplexer should always fail and we know it at this very point. Simply, if we don't allow address reusage, then attempt to reuse the address should fail. This, however, relies on that the call to bind will fail due to that SO_REUSEADDR on a UDP socket is off by default.

  2. The default value for SRTO_REUSEADDR is true, while SO_REUSEADDR for a UDP socket is false by default, which is actually a consequence of the need to share the multiplexer between the listener and the accepted sockets. Although it is required that accepted sockets share the multiplexer with the listener socket, it's unclear how this option should control the fact as to whether, for example, a potential caller socket can share the multiplexer with a listener socket and its accepted sockets.

What needs to be done:

  1. Made a unit test that embraces all these cases.
  2. Define correctly the consequences of having this flag set and clear in a various combinations of binding sockets to the address, including UDP socket acquisition (srt_bind_acquire).
alexpokotilo commented 3 years ago

HI @ethouris, sorry for bothering you again. could you please explain what do you mean by:

The default value for SRTO_REUSEADDR is true, while SO_REUSEADDR for a UDP socket is false by default, which is actually a consequence of the need to share the multiplexer between the listener and the accepted sockets.

We have to set "SRTO_REUSEADDR" to false in our program now to workaround issue our engineer described/filed in #1503, but I don't see any problems for accepted/listening socket because of that. In fact I can listen/accept new connections while serving existing connections even if I set "SRTO_REUSEADDR" to false to listening socket. And this is why I'm asking you to give some details about this statement. You described two issues with "SRTO_REUSEADDR". The first issue is completely reasonable. Moreover I've got the problem between "listener" vs "caller" part you described in the second issue. But the comment above is not clear to me unfortunately.

ethouris commented 3 years ago

Sharing a multiplexer between listener socket and the accepted sockets is something that exists since always. If this still works when you set false SRTO_REUSEADDR it's likely because this flag in that case is ignored. Without the possibility of sharing the multiplexer you simply wouldn't be able to accept connections because accepted sockets wouldn't be able to be created.

The general problem for the issues I described is that binding SRT sockets must be somehow agreed with system (UDP) sockets. SRT adds a possibility to share Multiplexers (and their associated UDP sockets) between SRT sockets, however the API for binding is still more-less the same as the API for system sockets. This means that you can bind other sockets to the address and port as the other sockets and use them independently, but there could be also conflicting bind specifications. For example:

  1. You bind a socket to 10.0.0.1:5000
  2. You create a second socket and bind it to:
    • 10.0.0.1:5000 (with reuseaddr=true) - ok, binding adds the SRT socket to the share pool
    • 10.0.0.1:5000 (with reuseaddr=false) - ERROR, this binding is not reusable
    • 10.0.0.1:5001 - ok, this bind address is free (that is, you can create another UDP socket and set it this binding)
    • 0.0.0.0:5000 (regardless of reuseaddr) - ERROR, the port :5000 is already occupied by the previous binding, even if this doesn't cover all possible interfaces
    • 192.168.1.1:5000 (regardless of reuseaddr) - ok, the existing binding uses explicit IP address, so this endpoint is free to bind

There are two problems in the code:

  1. The conflicting binding is not being detected early enough and it relies on reporting the error from system ::bind call. This leads to misleading errors
  2. There are some cases like when you try to bind to 10.0.0.1:5000 while you already have a binding to 0.0.0.0:5000 it ships you this binding as a share, which is wrong - it should result in conflict and report an error (not sure if this is exactly this case - I'm talking from memory).

I've prepared #1588 to fix this, but it's still underway.

alexpokotilo commented 3 years ago

Hi @ethouris, 1)

Sharing a multiplexer between listener socket and the accepted sockets is something that exists since always. If this still works when you set false SRTO_REUSEADDR it's likely because this flag in that case is ignored. Without the possibility of sharing the multiplexer you simply wouldn't be able to accept connections because accepted sockets wouldn't be able to be created.

Whatever #1588 changes, please keep SRTO_REUSEADDR set to false ignorance for accepted/listener sockets relationship. there are many SRT versions and some programs use them interchangeably so if listener will fail to accept socket is SRTO_REUSEADDR set to false this will break a lot of programs. Moreover I don't see any reason to honor SRTO_REUSEADDR value in case of listener and it's accepted SRT sockets processing. If I understand correctly accepted SRT socket always works through it's parent UDP socket. From my POV SRTO_REUSEADDR should control socket reuse between listener sockets only. I mean if one listener wants to occupy given socket for itself and it's children, all other listeners cannot bind on same ip:port(or cases like 0.0.0.0). If an application wants initial and new listener can share the same socket, an appllication sets SRTO_REUSEADDR set true for both listener sockets. This way an application can use SRTO_REUSEADDR to disable reuse in case if reuse works incorrectly, fails etc. I agree that we can add some checks and fail binding even in case SRTO_REUSEADDR set to false so that we don't rely on ::bind errors. But if we made a mistake in these new checks there will be no way to disable them(these new checks). I'd vote for simple checks like you described in your examples. Even If we miss some cases like ipv6 over ipv4 binding(::ffff:0:0/96, ::ffff:0:0:0/96 etc) it's not a problem as bind to fail later and we will close wrong listener socket. But if we add too complex and error-prone checks, this will lead us to cases we cannot fix disabling SRTO_REUSEADDR.

Let me copy your initial statement once again:

The default value for SRTO_REUSEADDR is true, while SO_REUSEADDR for a UDP socket is false by default, which is actually a consequence of the need to share the multiplexer between the listener and the accepted sockets.

Am I right that this is incorrect statement and SRTO_REUSEADDR doesn't affect listener and it's accepted sockets and it should not ?

2) in your example

You bind a socket to 10.0.0.1:5000

it should be: You bind a socket to 10.0.0.1:5000(with reuseaddr=true)

3) in your initial message you stated

it's unclear how this option should control the fact as to whether, for example, a potential caller socket can share the multiplexer with a listener socket and its accepted sockets.

What do you mean by "caller socket" in this case ? if you mean socket working through srt_connect I don't see why this socket need any reuse at all. Such sockets don't bind. If you mean listener that starts data sending after accept, such listener should behave the same way as receiving listener.

Thanks for your attention to my questions. I hope my questions/suggestions/corrections are useful

ethouris commented 3 years ago

Likely I'll need to review this thing, but there are still some statements I think remaining true:

  1. The SRTO_REUSEADDR is true by default. Can't say now what the real reason for it, although likely setting it false on the listener socket doesn't prevent the binding from being shared between the listener and accepted sockets - otherwise the accepted socket could not be created. And this isn't going to be changed by obvious reasons.
  2. The example you mentioned at (2) I didn't mention that it's with reuseaddr=true because this is the default.
  3. Of course callers sockets do bind - at least they can. This is only - in contradiction to listener sockets - not obligatory, but optional. This is exaclty how the multiplexer sharing happens between caller sockets.

What I think would be consistent is that if you set SRTO_REUSEADDR on the listener socket, then the sharing of accepted sockets still works, but you couldn't bind a caller socket to the same address:port as an existing listener, nor you couldn't share the multiplexer explicitly by calling srt_bind_acquire on the new socket with existing accepted socket.

alexpokotilo commented 3 years ago

@ethouris, do you mean caller listen in case of rendezvous mode in use or what ?

ethouris commented 3 years ago

Wait, I got kinda confused.

The default situation should be the following:

This is what should change if SRTO_REUSEADDR is set to false:

alexpokotilo commented 2 years ago

Is there anything else not fixed after https://github.com/Haivision/srt/pull/1588 ?