Closed ethouris closed 4 years ago
If I remember well, the NAK report period is based on the RTT. If your test case is with close peers chances are high that immediate NAK and LOSSREPORT collide.
Minimum NAK interval is NAKIntmin=300 ms, and it is set at the very beginning of the streaming (in CUDT::open()
). PR #745 updates the timer after the connection is established (updated in CUDT::setupCC()
).
NAK interval is updated after sending a periodic loss report. The new value is NAKInt = RTT + 4 × RTTVar. This value is passed to the Congestion Control (CC) module.
File CC can update the value based on the reported receiving speed Rrcv (packets per second) and the length of the loss list LOSSlen: NAKInt = RTT + 4 × RTTVar + LOSSlen × 106 / Rrcv.
Live CC will update the value by dividing it by 2:: NAKInt = (RTT + 4 × RTTVar) / 2.
The minimum value is NAKInt = max(NAKInt, NAKIntmin).
NAK time is updated only after sending the periodic NAK report. Meaning that even if a loss report was already sent, a periodic report can be triggered immediately and send the same loss report again.
On a second thought, the possibility of sending the loss report always twice might be an interesting feature. When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%, which is "almost certain" - at the expense of using twice the overhead space as per necessary retransmission. This might be added as an option, after this one is fixed.
When we have a probability of losing a packet 20%, which means 80% probability of delivery, first retransmission increases it to 88%, second one to 96%
First send + retransmission gives 0.8+0.2*0.8=0.96 probability to deliver a packet. Second retransmission gives 0.992.
These things can be done for improving it:
From the internal report SAS-258.
Some improvements are required to reduce the overhead of periodic NAK reports.
What you described in #1 is exactly FASTREXMIT. It is intentionally turned off in case when NAKREPORT is working because it's considered efficient enough.
We need to decide what is more important, or even better, provide options that allow users to decide what is more important for them: whether they can accept extra overhead in order to maximize reliability, or they need as small overhead as possible and accept the reliability this setting provides.
If we need more reliability, then of course, packets should be stubbornly retransmitted, but then the receiver should send ACKs quicker in order to update the sender with the "already received packets" information. We might also revive my earlier idea of "ACK bitmap", that is, together with ACK there's sent an additional number that defines the fate of the next 32 packets following the ACK-ed one, so that packets that follow a loss, but were received, won't be further retransmitted. Important thing in this solution is not only RTT, but also RTT variance, or possibly another value settable by an option, so that the false NAK report isn't sent too early in case when RTT happens to often diverge much from the average.
If we need least overhead, then it must be taken care of that packets are retransmitted only if the sender is absolutely certain (it is made so certain by the receiver) that the receiver didn't get this packet retransmitted and minimize the number of uselessly retransmitted packets. This would, however, happen at the expense of decreased probability that twice lost packets will be retransmitted fast enough, and usually higher reliability will come at bigger latency penalty.
The #2 is AFAIK already implemented - even there's a comment that it does it "TCP way". Although probably it works only in file mode and if I'm not mistaken it's what triggers the LATEREXMIT method.
The NAKREPORT functionality should send the LOSSREPORT periodically, first after the expected retransmission didn't happen. However it looks like in every case the first of those periodic loss reports is sent immediately after the "detection-based" lossreport (the first one), which makes this report useless, although still resulting in sending the loss-reported packet. This may result in unnecessary excessive retransmissions.
The early research points that probably there can be a problem with setting the time of the periodic nakreport in a situation when one loss report is generated just before the moment when a periodic nakreport is about to be sent due to a previous lossreport. Or, in other words, there are two losses reported in a time relation very close to a time interval of the periodic nakreport. Possibly there should be introduced a time for loss reports so losses are first checked if it isn't "too early" for particular loss range to be reported in NAKREPORT, but still others should be reported.