ethouris commented 5 years ago

The NAKREPORT functionality should send the LOSSREPORT periodically, first after the expected retransmission didn't happen. However it looks like in every case the first of those periodic loss reports is sent immediately after the "detection-based" lossreport (the first one), which makes this report useless, although still resulting in sending the loss-reported packet. This may result in unnecessary excessive retransmissions.

The early research points that probably there can be a problem with setting the time of the periodic nakreport in a situation when one loss report is generated just before the moment when a periodic nakreport is about to be sent due to a previous lossreport. Or, in other words, there are two losses reported in a time relation very close to a time interval of the periodic nakreport. Possibly there should be introduced a time for loss reports so losses are first checked if it isn't "too early" for particular loss range to be reported in NAKREPORT, but still others should be reported.

jeandube commented 5 years ago

If I remember well, the NAK report period is based on the RTT. If your test case is with close peers chances are high that immediate NAK and LOSSREPORT collide.

maxsharabayko commented 5 years ago

Minimum NAK interval is NAKInt_min=300 ms, and it is set at the very beginning of the streaming (in CUDT::open()). PR #745 updates the timer after the connection is established (updated in CUDT::setupCC()).

NAK interval is updated after sending a periodic loss report. The new value is NAKInt = RTT + 4 × RTTVar. This value is passed to the Congestion Control (CC) module.

File CC can update the value based on the reported receiving speed R_rcv (packets per second) and the length of the loss list LOSS_len: NAKInt = RTT + 4 × RTTVar + LOSS_len × 10⁶ / R_rcv.

Live CC will update the value by dividing it by 2:: NAKInt = (RTT + 4 × RTTVar) / 2.

The minimum value is NAKInt = max(NAKInt, NAKInt_min).

NAK time is updated only after sending the periodic NAK report. Meaning that even if a loss report was already sent, a periodic report can be triggered immediately and send the same loss report again.

ethouris commented 4 years ago

On a second thought, the possibility of sending the loss report always twice might be an interesting feature. When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%, which is "almost certain" - at the expense of using twice the overhead space as per necessary retransmission. This might be added as an option, after this one is fixed.

maxsharabayko commented 4 years ago

When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%

First send + retransmission gives 0.8+0.2*0.8=0.96 probability to deliver a packet. Second retransmission gives 0.992.

ethouris commented 4 years ago

These things can be done for improving it:

Every loss must have recorded time of the first notice so that periodic retransmission is focused on particular loss
Checks for a need for NAK report should happen in every checkTimers(), and whether it should be done for particular loss, it should depend on its initial time
After a retransmitted packet was received, the "lite ACK" should be sent, with limitations (that is, not too early towards the previous ACK and not too early before the next "fat ACK").

maxsharabayko commented 4 years ago

From the internal report SAS-258.

Some improvements are required to reduce the overhead of periodic NAK reports.

Sender may have a timeout for the next retransmission of the lost packet. The timeout value might be (RTT + 10ms). If sender hasn't received the acknowledgement of the lost packet, it should consider to retransmit it either immediately, or upon reception of the next NAK report.
The receiver can be blocked from acknowledging further DATA packets. During this blockade period, it might still consider to send ACK with the same sequence number, thus letting know that the next packet is not yet received.
The receiver may send lite ACK packet upon reception of the previously missing packet, that was blocking further ACKs. This way it may inform sender within one RTT (instead of RTT + 10ms) that the packet was received.

ethouris commented 4 years ago

What you described in #1 is exactly FASTREXMIT. It is intentionally turned off in case when NAKREPORT is working because it's considered efficient enough.

We need to decide what is more important, or even better, provide options that allow users to decide what is more important for them: whether they can accept extra overhead in order to maximize reliability, or they need as small overhead as possible and accept the reliability this setting provides.

If we need more reliability, then of course, packets should be stubbornly retransmitted, but then the receiver should send ACKs quicker in order to update the sender with the "already received packets" information. We might also revive my earlier idea of "ACK bitmap", that is, together with ACK there's sent an additional number that defines the fate of the next 32 packets following the ACK-ed one, so that packets that follow a loss, but were received, won't be further retransmitted. Important thing in this solution is not only RTT, but also RTT variance, or possibly another value settable by an option, so that the false NAK report isn't sent too early in case when RTT happens to often diverge much from the average.

If we need least overhead, then it must be taken care of that packets are retransmitted only if the sender is absolutely certain (it is made so certain by the receiver) that the receiver didn't get this packet retransmitted and minimize the number of uselessly retransmitted packets. This would, however, happen at the expense of decreased probability that twice lost packets will be retransmitted fast enough, and usually higher reliability will come at bigger latency penalty.

The #2 is AFAIK already implemented - even there's a comment that it does it "TCP way". Although probably it works only in file mode and if I'm not mistaken it's what triggers the LATEREXMIT method.

3 - same as mine point 3 above, and yes, good idea, unless it happens to often. It can't be then sent after every retransmitted packet, but with some reasonable "ionization wearoff time" it would be good.

Haivision / srt

Live mode NAKREPORT is triggered too quickly after loss detection report #701

3 - same as mine point 3 above, and yes, good idea, unless it happens to often. It can't be then sent after every retransmitted packet, but with some reasonable "ionization wearoff time" it would be good.