Kitura / Kitura-WebSocket

WebSocket support for Kitura
Apache License 2.0
68 stars 30 forks source link

autobahn: Performance analysis of websocket-nio #76

Closed pushkarnk closed 5 years ago

pushkarnk commented 5 years ago

Autobahn has 54 performance tests. websocket-nio performs better than master only on 16/54 tests. On the rest, the performance gap is also low as 4% and as high as 200%.

pushkarnk commented 5 years ago

Investigation revealed unmasking to be a possible bottleneck, specifically the following function from NIOWebSocket:

 mutating func webSocketMask(_ maskingKey: WebSocketMaskingKey, indexOffset: Int = 0) {
        self.withUnsafeMutableReadableBytes {
            for (index, byte) in $0.enumerated() {
                $0[index] = byte ^ maskingKey[(index + indexOffset) % 4]
            }
        }
    }

Flamegraphs show UnsafeMutableRawBufferPointer.enumerated() running for ~75% of the time. Rewriting this function with a simpler loop gives a good speedup:

 mutating func webSocketMask(_ maskingKey: WebSocketMaskingKey, indexOffset: Int = 0) {
        self.withUnsafeMutableReadableBytes {
            var index = 0
            for byte in $0 {
                $0[index] = byte ^ maskingKey[(index + indexOffset) % 4]
                index += 1
            }
        }
    }

This seems to be directly associated with the performance of Sequence.enumerated().

For example, on Ubuntu 14.04 image running on my machine, with 1M elements in array, this loop finishes in ~0.6s

var index = 0 
for a in array {
    _ = a 
    index += 1
}

.. while this loop takes ~1.7s on the same array

for (index, a) in array.enumerated() {
    _ = a 
    _ = index
}
pushkarnk commented 5 years ago

binary-chop-size binary-message binary-rtt fragmented-binary-message fragmented-text-message text-chop-size text-message text-rtt

pushkarnk commented 5 years ago

In the charts above, WS-NIO-new is the websocket-nio branch with the change mentioned in this comment

pushkarnk commented 5 years ago

Raised a PR against NIOWebSoscket: https://github.com/apple/swift-nio/pull/793

pushkarnk commented 5 years ago

I did one mistake with my testing above. I ran the tests in debug mode. As indicated in the response to https://github.com/apple/swift-nio/pull/793 running in the release mode, the performance gap reduces considerably. These numbers are from two single runs with KituraWebSocket and websocket-nio:

  Kitura-WebSocket(time in ms) WebSocket-NIO(time in ms) % rise in time-taken with NIO
9.1.1 5 8 60
9.1.2 14 8 -42.857143
9.1.3 53 30 -43.396226
9.1.4 216 124 -42.592593
9.1.5 452 259 -42.699115
9.1.6 898 530 -40.979955
9.2.1 4 2 -50
9.2.2 6 6 0
9.2.3 21 21 0
9.2.4 86 88 2.3255814
9.2.5 185 189 2.16216216
9.2.6 410 401 -2.195122
9.3.1 839 756 -9.8927294
9.3.2 358 269 -24.860335
9.3.3 224 138 -38.392857
9.3.4 186 102 -45.16129
9.3.5 176 92 -47.727273
9.3.6 179 89 -50.27933
9.3.7 177 90 -49.152542
9.3.8 183 90 -50.819672
9.3.9 174 87 -50
9.4.1 704 706 0.28409091
9.4.2 207 213 2.89855072
9.4.3 79 81 2.53164557
9.4.4 46 46 0
9.4.5 37 35 -5.4054054
9.4.6 35 41 17.1428571
9.4.7 35 38 8.57142857
9.4.8 35 37 5.71428571
9.4.9 34 34 0
9.5.1 1127 1093 -3.0168589
9.5.2 555 527 -5.045045
9.5.3 307 279 -9.1205212
9.5.4 175 156 -10.857143
9.5.5 121 97 -19.834711
9.5.6 84 62 -26.190476
9.6.1 1080 1023 -5.2777778
9.6.2 552 525 -4.8913043
9.6.3 281 287 2.13523132
9.6.4 188 150 -20.212766
9.6.5 87 92 5.74712644
9.6.6 90 55 -38.888889
9.7.1 164 185 12.804878
9.7.2 170 194 14.1176471
9.7.3 176 186 5.68181818
9.7.4 199 229 15.0753769
9.7.5 303 257 -15.181518
9.7.6 40001 545 -98.637534
9.8.1 153 156 1.96078431
9.8.2 157 174 10.8280255
9.8.3 162 164 1.2345679
9.8.4 174 192 10.3448276
9.8.5 225 240 6.66666667
9.8.6 39999 463 -98.842471
pushkarnk commented 5 years ago

Based on the results above, the following are the runs that deserve some investigation:

  Kitura-WebSocket WebSocket-NIO % rise in time-taken with NIO
9.1.1 5 8 60  
9.4.6 35 41 17.1428571  
9.4.7 35 38 8.57142857  
9.4.8 35 37 5.71428571  
9.6.5 87 92 5.74712644  
9.7.1 164 185 12.804878  
9.7.2 170 194 14.1176471  
9.7.3 176 186 5.68181818  
9.7.4 199 229 15.0753769  
9.8.2 157 174 10.8280255  
9.8.4 174 192 10.3448276  
9.8.5 225 240 6.66666667  
pushkarnk commented 5 years ago

On running each of the above tests 100 times, differences seem to be getting ironed out:

Test # Kitura-WebSocket Websocket-NIO  
9.1.1 4   3   -25
9.4.6 41   42   2.43902439
9.4.7 40   41   2.5
9.4.8 42   46   9.52380952
9.6.5 86   90   4.65116279
9.7.1 163   163   0
9.7.2 177   177   0
9.7.3 182   175   -3.8461538
9.7.4 201   202   0.49751244
9.8.2 161   156   -3.1055901
9.8.4 181   176   -2.7624309
9.8.5 232   222   -4.3103448

I think it is OK to assume that the performance of websocket-nio is equal to or better than Kitura-WebSocket on most benchmarking tests in autobahn.

Hence closing this issue.