Lots of duplicate packets are being send

eelcocramer commented 7 years ago

I am trying to hunt down an issue I have with this library where I notice that the uTP transfer does not back-off when there is TCP background traffic.

Tracing this issue I noticed that an extensive number of packets are being resend. I noticed this was primarily due to this call to resend which is caused by this call to ackSkipped. Using wireshark I noticed that lots of ACK packets do have the Selective ACKs extension enabled but do not have an extension bitmask:

screen shot 2017-05-30 at 15 14 07

According to the specification the length should be:

Note that the len field of extensions refer to bytes, which in this extension must be at least 4, and in multiples of 4.

There are also ACK packets that do have this set correctly:

screen shot 2017-05-30 at 15 13 23

Now as a quick hack I've tried not calling ackSkipped when there are no bytes in the extension:

        switch ext.Type {
        case extensionTypeSelectiveAck:
            if len(ext.Bytes) > 0 {
                c.ackSkipped(h.AckNr + 1)
                bitmask := selectiveAckBitmask{ext.Bytes}
                for i := 0; i < bitmask.NumBits(); i++ {
                    if bitmask.BitIsSet(i) {
                        nr := h.AckNr + 2 + uint16(i)
                        c.ack(nr)
                    } else {
                        c.ackSkipped(h.AckNr + 2 + uint16(i))
                    }
                }
            }
              }

and I've also changed ackSkipped to not resend the packets that often:

    switch send.acksSkipped {
    case 60, 120: // was 3, 60
        if logLevel >= 1 {
            log.Printf("acksSkipped = %d", send.acksSkipped)
        }
        ackSkippedResends.Add(1)
        send.resend()
        send.resendTimer.Reset(c.resendTimeout() * time.Duration(send.numResends))
    default:
    }

For me this is dramatically decreasing the number of duplicates that are being send.

However it does not fix my issue with uTP traffic throttling back when there is TCP background traffic.

Does this make sense?

anacrolix commented 7 years ago

Thanks, this is very detailed. I did not know Wireshark was aware of uTP that's helpful. If this implementation backs off in the presence of TCP in its current form, that would be coincidental, I've put no effort into ensuring that. I'll take a more detailed look at your findings tomorrow.

eelcocramer commented 7 years ago

Thanks.

As an addition. I've noticed that the background traffic is less effected there are less out-of-order packets being buffered:

    // Maximum out-of-order packets to buffer.
    maxUnackedInbound = 64
    maxUnackedSends   = 64

The lower the number is the better performance the background traffic has. The performance of the uTP transfer seems not to be effected. I have to investigate this further though.

eelcocramer commented 7 years ago

I did some further investigation and added some counters to see how many packets are received and how many of those are duplicates.

I noticed that my hack in applyAcks does not seem to influence the number of duplicates packets at all. Of course the extension byte should be zero if there is no bitmask is present in the packets but it does not seem to cause resends (as I assumed before).

What does seem to influence the number of duplicates is my hack in ackSkipped where I changed triggering the resend after 60 and 120 skipped acks in favour of 3 and 60 skipped acks.

Together with changing the maxUnackedSends to a much lower value (32) seems to influence the number of duplicates packages the most (75 to 90% less duplicates in my measurements).

The number of duplicates I'm seeing (without the hacks) are somewhere from 3% up to 10% of the total packages that are being send. With my hacks the number drops to around 1%.

eelcocramer commented 7 years ago

Another update. I do not think that both hacks from the initial post fix the root cause of this problem. My current theory is that the initial sending of the packets is flooding the network and as a result maybe acks do not make it back to the sender or the acks coming in after the resendTimeout. I have another hack were I delay the initial sending the the packets based upon the resendTimeout. Preliminary results of this is that the number of duplicate messages is being reduced to around the 1% level. I will do some more tests to see if this delays the total transfer to much and if this also has a positive impact on the background traffic.

The hack look like this:


func (c *Conn) Write(p []byte) (n int, err error) {
    time.Sleep(c.resendTimeout()) // sleep before writing the initial message
    mu.Lock()
    defer mu.Unlock()
    for len(p) != 0 {
...

anacrolix commented 7 years ago

Fixing the extension bytes shouldn't affect resends at all. It should only potentially improve compatibility out there with other utp implementations. But it was helpful you spotted this, and I've fixed it.

The amount of resending should be directly affected by the timeouts and allowed unacked sends. However decreasing the allowed number of unacked sends will decrease throughput when there is high latency. I believe the spec limits unacked send packets by adjusting its window size. This implementation doesn't do that.

It might be very effective to block writes, and thereby further unacked sends when there are sends that haven't been acked in a timely fashion, as we track the usual latency.

I would guess that the issues you're describing are all resulting from your OS being overwhelmed by the number of sends, and failing huge batches of consecutive sends, which results in "resend storms". What OS/distribution/arch are on you on? What Go version? What kind of data rates are you achieving?

eelcocramer commented 7 years ago

I concur with your guess. I add another post to this topic with some of my findings using the time.Sleep(c.resendTimeout()) hack I mentioned before.

I'm currently testing UTP within a network emulator which emulates a small wifi network (maximum rate of 11Mbit/s) with 3 nodes in the network.

screen shot 2017-06-01 at 11 21 02

The nodes all run Ubuntu 14.04 x86_64 with go version 1.8.

With the current master branch of this repo I currently achieving around the maximum transfer rate (11Mbit/s) in my network.

eelcocramer commented 7 years ago

I've compared the current master with the time.Sleep(c.resendTimeout()) hack I mentioned above and I get the following results.

I'm doing the following test in the 3 node network I mentioned above. The maximum data rate is 11Mbit/s.

Initially I tested how long it would take to transfer a 25MB file from node 3 to node 2 using secure copy (scp). This is my benchmark for tcp traffic without utp traffic interfering. This transfer takes around 40 seconds in my test network.

current master

Then I tried how long it would take to do a 100MB file transfer using ucat from node 3 to node 1.

With the current master this would take 4m43s. There are 122,028 packets being received at node 1 from which 2897 are duplicates.

Next I tried transferring 100MB with ucat with the scp transfer on the background. This time the ucat transfer took 5m05s with 122,923 packets received of which 3441 are duplicates. The scp transfer took 4m29s (way longer than without the ucat transfer).

sleep hack

Then I tried the same using the sleep hack.

Now the ucat transfer takes 7m20s with 77,221 packets received from which 420 are duplicates. This is significantly slower than the current master.

The ucat transfer with the scp transfer on the background took 8m03s with 77,321 packets received from which 520 are duplicates. The scp transfer is almost not delayed and took 42s.

conclusion

The congestion control works beter with the hack but also slows down the transfer when there is no background traffic.

There is something strange in the measurements. It seems that more non-duplicate packets are received using the current master then using the fix. I'm not sure where this comes from.

graphs

I've used wireshark to graph the throughput for both scenarios. For the graphs I transfer a 25Mb file using ucat. The graph is made on node 3. The green line represents the measure transfer rate of ucat the red line represents the transfer rate of scp.

current master without scp

utp-nofix-sender-without-ssh

current master with scp

utp-nofix-sender-with-ssh

sleep hack without scp

utp-sender-without-ssh

sleep hack with scp

utp-sender-with-ssh

eelcocramer commented 7 years ago

I've played around with the code trying to come up with a more elegant solution for the sleep hack. I pushed something to my fork of the repository that works very well in my configuration. It needs more testing and also the test cases currently seem to deadlock. With these changes I get transfer speeds that are near the speeds of the current master while it still backs off when there is background traffic. It also reduced the number of duplicate packages to almost zero for me.

Code is here: https://github.com/eelcocramer/utp/tree/sleep-hack

anacrolix commented 7 years ago

I think my guess about your OS is probably right on. Your code has the effect of slowing down the number of sends avoiding large numbers of consecutive packets from being dropped. It does it by holding the package lock, which isn't ideal. I think a solution identifying either the true rate limit would be more effective.

eelcocramer commented 7 years ago

I don't understand why the OS should be reason. Shouldn't the protocol be more gentle on the available resources (bandwidth, CPU)?

I agree that my changes are not ideal but the uTP spec says the following in the paragraph about congestion control:

This is achieved by including a high resolution timestamp in every packet that's sent over uTP, and the receiving end calculates the difference between its own high resolution timer and the timestamp in the packet it received. This difference is then fed back to the original sender of the packet (timestamp_difference_microseconds). This value is not meaningful as an absolute value. The clocks in the machines are most likely not synchronized, especially not down to microsecond resolution, and the time the packet is in transit is also included in the difference of these timestamps. However, the value is useful in comparison to previous values.

I was under the assumption that resendTimeout() is calculating a timeout and which is used to limit the rate of the next packet. Couldn't that value be used as a rate limit?

Currently this timeout value is used only acks and package retransmits. Not in the initial sending of the data packets. This is what I tried to change. And my initial testing shows that it seems to work quite well for congestion control and is prioritising background TCP traffic. It also seems to be more relaxed on CPU use.

Also the transfer times seem not to be affected to much but I will do some more testing on this.

Do you have benchmarks test that you use to measure the throughput?

Thanks.

eelcocramer commented 7 years ago

According these Syncthing bench there is plenty of room for improvement:

The current version gives me

BenchmarkSyncthingTCP-8                10000        103641 ns/op    1265.24 MB/s         112 B/op         13 allocs/op
BenchmarkSyncthingUDPUTP-8              1000       1785891 ns/op      73.43 MB/s      638472 B/op       1810 allocs/op
BenchmarkSyncthingInprocUTP-8           2000        698624 ns/op     187.70 MB/s      911961 B/op       1553 allocs/op

My fork gives me

BenchmarkSyncthingTCP-8                10000        103471 ns/op    1267.33 MB/s         112 B/op         13 allocs/op
BenchmarkSyncthingUDPUTP-8               300       4327944 ns/op      30.30 MB/s      739614 B/op       2555 allocs/op
BenchmarkSyncthingInprocUTP-8            300       3994464 ns/op      32.83 MB/s     1552726 B/op       3454 allocs/op

AudriusButkevicius commented 7 years ago

Just a quick note. One of the reasons this is not used in syncthing is because it hogs everything that's available. We are trying to incorporate KCP but that has issues with getting stun to work with it.

eelcocramer commented 7 years ago

Thanks. I'm seeing the same behaviour using this library with IPFS that my main reason for investigating this.

anacrolix commented 7 years ago

Take a look at a new library I'm working on: https://github.com/anacrolix/go-libutp.

eelcocramer commented 7 years ago

Thanks. I noticed that repository last week. I will give it try.

AudriusButkevicius commented 7 years ago

Bare in mind, if you require cross compiling this makes things very difficult due to cgo.

anacrolix / utp