[FR] Allow srt_bstats() after lost connection

Haivision / srt

Secure, Reliable, Transport

https://www.srtalliance.org

Mozilla Public License 2.0

3.01k stars 824 forks source link

[FR] Allow srt_bstats() after lost connection #2177

Open lelegard opened 2 years ago

lelegard commented 2 years ago

Is your feature request related to a problem? Please describe.

I use srt_bstats() to get statistics. When statistics are collected on a regular basis during transmission, it works fine.

The actual use case is to get and display statistics at the end of the transmission.

On the sender side, the peer decides to terminate. It can collect statistics first (works fine) and then closes the SRT socket.
On the receiver side, the end of transmission is detected by an SRT_ECONNLOST status on receive. Before closing the SRT socket, I want to collect statistics but srt_bstats() fails with a SRT_ECONNLOST status.

Describe the solution you'd like

When an SRT socket is not yet closed, it should keep its statistics data so that srt_bstats() can correctly return them, even after the connection was dropped.

The SRT_ECONNLOST status is understandable on a send or receive operation because this operation cannot be performed without a connected peer.

Collecting statistics data, on the other hand, is not doing something on the socket. This is only returning data which were collected in the past. As long as the socket is not closed, these data should be accessible.

Describe alternatives you've considered

I don't see any. If I want to collect the final statistics of the connection, I need to be notified of the end of transmission. At this point, it is too late to collect statistics.

Additional context

Tested in the context of the srt plugins of the TSDuck utility.

jeandube commented 2 years ago

I would like that fix too. The stats are present and only reset upon next connection if I remember well. The getOpt is rejected because !connected. I implemented automatic reconnection in the app and would like to maintain continuous counters throughout reconnections.

lelegard commented 2 years ago

Any status on this one? Could it be considered for some future release or is it too difficult to implement in the current design?

maxsharabayko commented 2 years ago

The main difficulty here is that this change would mess with the lifetime assumptions of a socket. These assumptions need to be reworked (also for #2098).

With the current design, after a socket is closed, it still exists for several seconds inside the library until it is finally freed. The main assumption of the library is that once a connection is considered broken, no one gets access to it, making it safe to finally free its resources. It also means that the library does not guarantee you can access a "connection broken" socket, as it can be closed any time soon. E.g., the statistics retrieval function could have returned stats, but if a socket is being closed, it will just return the "connection lost" error.

void srt::CUDT::bstats(CBytePerfMon *perf, bool clear, bool instantaneous)
{
    if (!m_bConnected)
        throw CUDTException(MJ_CONNECTION, MN_NOCONN, 0);
    if (m_bBroken || m_bClosing)
        throw CUDTException(MJ_CONNECTION, MN_CONNLOST, 0);
    // ...

As you suggested in #2098, the lifetime of a socket should be rather managed by an application. A socket must be available even if it is in a "broken" state. Only once the application closes, it can be considered unavailable and inaccessible from the app. This would also change the API.

As of now, we cannot take it in 1.4.5 as other improvements are planned. Let it be in the backlog to be considered at the next planning session.

lelegard commented 2 years ago

Hi @maxsharabayko,

Understood. As an interim solution, during the period of several seconds after the lost connection, before the socket is automatically freed, would it be possible to still get the stats? The three tested conditions "not connected", "broken" or "closing" are valid reasons to prevent send/receive but they are not valid reasons to refuse to return the stats. Even "not connected" is not a valid reason, the stats should be just zeroes.

Of course, I understand that the stats won't be accessible when the socket is automatically freed, after a few seconds. I also understand that this will be less easy to fix.

Issue #2098 described a less common use case: voluntarily keeping the socket inactive for a while. This use case revealed a deeper API design issue but it is less frequently used. The main use case is constantly reading incoming data and immediately reading the stats when the connection is lost. In that case, the stats data should be still available, aren't they? In that case, the problem is the undue tests on the connection state before returning the stats.

maxsharabayko commented 2 years ago

@lelegard

Understood. As an interim solution, during the period of several seconds after the lost connection, before the socket is automatically freed, would it be possible to still get the stats? The three tested conditions "not connected", "broken" or "closing" are valid reasons to prevent send/receive but they are not valid reasons to refuse to return the stats. Even "not connected" is not a valid reason, the stats should be just zeroes.

Good suggestion. Might be worth checking.

ethouris commented 2 years ago

Let me precise some things here. The broken socket doesn't allow for sending, but it does allow for receiving, of course, only remaining data that are in the receiver buffer, and they are available almost indefinitely (depends on the linger option). Several things are not allowed to be done on a socket not because it's an operation, but because of access to objects that may be vital for the operation and may no longer exist.

The automatic closing of a broken socket was something that was added as a workaround to some bad behaving scenario within the group code - when a member socket is broken on the caller side, it couldn't let the application close it because this way it occupies its link and the application would never know to restore it. Might be that this should be changed (that is, the application should close the broken member socket by itself). The group code has also introduced lots of messup also around the closing activity.

I think - at least as I remember what is in the code - the procedure of automatic closing of the broken socket was partially done already in UDT and in SRT something had to be chosen, especially that this procedure relied on that particular things do not happen simultaneously in two threads, while this something was added due to group code. What I think is: yes, the broken socket should remain in broken state and be available indefinitely until manually closed, even if the linger rules no longer mandate it to keep the receiver buffer data still available for reading. On the other hand, if the automatic closing behavior is by any reason necessary for the groups, a special flag for this could be added to the socket. This might be especially important for the accepted sockets because accepted sockets get created in the background and the application usually has nothing to do with them. Anyway, this might be necessary for the groups because the idea behind a group is that you connect a group to a group on the other side, and you are allowed to do it multiple times.

Note that here with the groups the statistics retrieved from a member socket would suffer of the same problem, even if the problem for a single socket is fixed.