Closed pushkarnk closed 5 years ago
Investigation revealed unmasking
to be a possible bottleneck, specifically the following function from NIOWebSocket:
mutating func webSocketMask(_ maskingKey: WebSocketMaskingKey, indexOffset: Int = 0) {
self.withUnsafeMutableReadableBytes {
for (index, byte) in $0.enumerated() {
$0[index] = byte ^ maskingKey[(index + indexOffset) % 4]
}
}
}
Flamegraphs show UnsafeMutableRawBufferPointer.enumerated() running for ~75% of the time. Rewriting this function with a simpler loop gives a good speedup:
mutating func webSocketMask(_ maskingKey: WebSocketMaskingKey, indexOffset: Int = 0) {
self.withUnsafeMutableReadableBytes {
var index = 0
for byte in $0 {
$0[index] = byte ^ maskingKey[(index + indexOffset) % 4]
index += 1
}
}
}
This seems to be directly associated with the performance of Sequence.enumerated()
.
For example, on Ubuntu 14.04 image running on my machine, with 1M elements in array, this loop finishes in ~0.6s
var index = 0
for a in array {
_ = a
index += 1
}
.. while this loop takes ~1.7s on the same array
for (index, a) in array.enumerated() {
_ = a
_ = index
}
In the charts above, WS-NIO-new
is the websocket-nio
branch with the change mentioned in this comment
Raised a PR against NIOWebSoscket: https://github.com/apple/swift-nio/pull/793
I did one mistake with my testing above. I ran the tests in debug mode. As indicated in the response to https://github.com/apple/swift-nio/pull/793 running in the release mode, the performance gap reduces considerably. These numbers are from two single runs with KituraWebSocket and websocket-nio:
Kitura-WebSocket(time in ms) | WebSocket-NIO(time in ms) | % rise in time-taken with NIO | |
---|---|---|---|
9.1.1 | 5 | 8 | 60 |
9.1.2 | 14 | 8 | -42.857143 |
9.1.3 | 53 | 30 | -43.396226 |
9.1.4 | 216 | 124 | -42.592593 |
9.1.5 | 452 | 259 | -42.699115 |
9.1.6 | 898 | 530 | -40.979955 |
9.2.1 | 4 | 2 | -50 |
9.2.2 | 6 | 6 | 0 |
9.2.3 | 21 | 21 | 0 |
9.2.4 | 86 | 88 | 2.3255814 |
9.2.5 | 185 | 189 | 2.16216216 |
9.2.6 | 410 | 401 | -2.195122 |
9.3.1 | 839 | 756 | -9.8927294 |
9.3.2 | 358 | 269 | -24.860335 |
9.3.3 | 224 | 138 | -38.392857 |
9.3.4 | 186 | 102 | -45.16129 |
9.3.5 | 176 | 92 | -47.727273 |
9.3.6 | 179 | 89 | -50.27933 |
9.3.7 | 177 | 90 | -49.152542 |
9.3.8 | 183 | 90 | -50.819672 |
9.3.9 | 174 | 87 | -50 |
9.4.1 | 704 | 706 | 0.28409091 |
9.4.2 | 207 | 213 | 2.89855072 |
9.4.3 | 79 | 81 | 2.53164557 |
9.4.4 | 46 | 46 | 0 |
9.4.5 | 37 | 35 | -5.4054054 |
9.4.6 | 35 | 41 | 17.1428571 |
9.4.7 | 35 | 38 | 8.57142857 |
9.4.8 | 35 | 37 | 5.71428571 |
9.4.9 | 34 | 34 | 0 |
9.5.1 | 1127 | 1093 | -3.0168589 |
9.5.2 | 555 | 527 | -5.045045 |
9.5.3 | 307 | 279 | -9.1205212 |
9.5.4 | 175 | 156 | -10.857143 |
9.5.5 | 121 | 97 | -19.834711 |
9.5.6 | 84 | 62 | -26.190476 |
9.6.1 | 1080 | 1023 | -5.2777778 |
9.6.2 | 552 | 525 | -4.8913043 |
9.6.3 | 281 | 287 | 2.13523132 |
9.6.4 | 188 | 150 | -20.212766 |
9.6.5 | 87 | 92 | 5.74712644 |
9.6.6 | 90 | 55 | -38.888889 |
9.7.1 | 164 | 185 | 12.804878 |
9.7.2 | 170 | 194 | 14.1176471 |
9.7.3 | 176 | 186 | 5.68181818 |
9.7.4 | 199 | 229 | 15.0753769 |
9.7.5 | 303 | 257 | -15.181518 |
9.7.6 | 40001 | 545 | -98.637534 |
9.8.1 | 153 | 156 | 1.96078431 |
9.8.2 | 157 | 174 | 10.8280255 |
9.8.3 | 162 | 164 | 1.2345679 |
9.8.4 | 174 | 192 | 10.3448276 |
9.8.5 | 225 | 240 | 6.66666667 |
9.8.6 | 39999 | 463 | -98.842471 |
Based on the results above, the following are the runs that deserve some investigation:
Kitura-WebSocket | WebSocket-NIO | % rise in time-taken with NIO | ||
---|---|---|---|---|
9.1.1 | 5 | 8 | 60 | |
9.4.6 | 35 | 41 | 17.1428571 | |
9.4.7 | 35 | 38 | 8.57142857 | |
9.4.8 | 35 | 37 | 5.71428571 | |
9.6.5 | 87 | 92 | 5.74712644 | |
9.7.1 | 164 | 185 | 12.804878 | |
9.7.2 | 170 | 194 | 14.1176471 | |
9.7.3 | 176 | 186 | 5.68181818 | |
9.7.4 | 199 | 229 | 15.0753769 | |
9.8.2 | 157 | 174 | 10.8280255 | |
9.8.4 | 174 | 192 | 10.3448276 | |
9.8.5 | 225 | 240 | 6.66666667 |
On running each of the above tests 100 times, differences seem to be getting ironed out:
Test # | Kitura-WebSocket | Websocket-NIO | |||
---|---|---|---|---|---|
9.1.1 | 4 | 3 | -25 | ||
9.4.6 | 41 | 42 | 2.43902439 | ||
9.4.7 | 40 | 41 | 2.5 | ||
9.4.8 | 42 | 46 | 9.52380952 | ||
9.6.5 | 86 | 90 | 4.65116279 | ||
9.7.1 | 163 | 163 | 0 | ||
9.7.2 | 177 | 177 | 0 | ||
9.7.3 | 182 | 175 | -3.8461538 | ||
9.7.4 | 201 | 202 | 0.49751244 | ||
9.8.2 | 161 | 156 | -3.1055901 | ||
9.8.4 | 181 | 176 | -2.7624309 | ||
9.8.5 | 232 | 222 | -4.3103448 |
I think it is OK to assume that the performance of websocket-nio
is equal to or better than Kitura-WebSocket
on most benchmarking tests in autobahn.
Hence closing this issue.
Autobahn has 54 performance tests.
websocket-nio
performs better thanmaster
only on 16/54 tests. On the rest, the performance gap is also low as 4% and as high as 200%.