betcode-org / betfair

betfairlightweight - Betfair API-NG python wrapper (with streaming)
MIT License
419 stars 147 forks source link

Fix socket performance regression on Linux and OSX #540

Closed petedmarsh closed 9 months ago

petedmarsh commented 10 months ago

In c5a3ad4d the socket code was changed and was benchmarked to be significantly better. Than benchmark was only run on Windows and the changes actually made things worse on Linux and OSX.

These changes fix the regression in Linux and OSX.

Note that performance on Windows is significantly improved while on Linux it's neutral to marginally worse. One thing to keep in mind is that these changes fix the issue where messages could get stuck in a buffer until a future read where the last two bytes read were \r\n.

If in reality there are frequently messages trapped in the buffer then these changes are a net positive. If that's rare then likely these changes are a net negative.

Remember that the benchmark is measuring the total time taken to receive and process all of the messages and not the time taken to process a message after having received it.

Benchmarks:

Windows

                                                     Benchmarks, repeat=5, number=5
┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐
│                                       Benchmark │ Min     │ Max     │ Mean    │ Min (+)         │ Max (+)         │ Mean (+)        │
├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤
| Original Client Code vs socket.makefile version │ 73.695  │ 76.014  │ 75.057  │ 45.791 (1.6x)   │ 46.994 (1.6x)   │ 46.509 (1.6x)   │
└─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘

Linux
                                                    Benchmarks, repeat=5, number=5
┌─────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────────────┬─────────────────┬─────────────────┐
│                                       Benchmark │ Min     │ Max     │ Mean    │ Min (+)         │ Max (+)         │ Mean (+)        │
├─────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────────────┼─────────────────┼─────────────────┤
│ Original Client Code vs socket.makefile version │ 11.542  │ 12.429  │ 11.746  │ 11.786 (-1.0x)  │ 12.769 (-1.0x)  │ 11.995 (-1.0x)  │
└─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
liampauling commented 10 months ago

Looks good on my Mac M1:

                                                    Benchmarks, repeat=5, number=5                                                     
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃                                       Benchmark ┃ Min     ┃ Max     ┃ Mean    ┃ Min (+)         ┃ Max (+)         ┃ Mean (+)        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ Original Client Code vs socket.makefile version │ 9.331   │ 10.450  │ 9.802   │ 7.546 (1.2x)    │ 8.201 (1.3x)    │ 7.824 (1.3x)    │
└─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘

However as you have mentioned previously it's slower if the messages are smaller than the buffer, I think we need to look into what is 'normal' from the betfair stream and optimise.

                                                    Benchmarks, repeat=5, number=5                                                     
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃                                       Benchmark ┃ Min     ┃ Max     ┃ Mean    ┃ Min (+)         ┃ Max (+)         ┃ Mean (+)        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ Original Client Code vs socket.makefile version │ 0.165   │ 0.170   │ 0.167   │ 0.234 (-1.4x)   │ 0.243 (-1.4x)   │ 0.237 (-1.4x)   │
└─────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
liampauling commented 10 months ago

Adding to the above I see 1.4x+ improvement by simply increasing the buffer to 8192 on the original client

petedmarsh commented 10 months ago

Increasing the buffer size will make it more likely that there's less-than-one-full-buffers worth of data to read and increase the chance that the last two characters read will be CRLF.

My gut feeling is that in all likelihood during real world usage the original code will be faster on average as long as the buffer is set appropriately, as on a low latency connection I believe a single message will be read in totality much more often than not.

It will have the chance of spikes of latency for individual messages if they ever get stuck in the buffer as described, but again my suspicion is that this would actually be rare.

I think you should not merge this and revert the other commit.

On Fri, 15 Sept 2023, 14:58 Liam, @.***> wrote:

Adding to the above I see 1.4x+ improvement by simply increasing the buffer to 8192 on the original client

— Reply to this email directly, view it on GitHub https://github.com/betcode-org/betfair/pull/540#issuecomment-1721241942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJDC6RWXZQDQU3BPB4KJJ3X2RGIFANCNFSM6AAAAAA4Y3Z2HQ . You are receiving this because you authored the thread.Message ID: @.***>

liampauling commented 10 months ago

Good point, started a thread on slack

liampauling commented 9 months ago

Just did some real world testing on this, when subscribed to all racing markets and all data I see no improvement in latency comparing this branch to master with roughly only 4% of updates greater than 1024 bytes.

Interestingly when a large update comes through (> 4000 bytes) master is quicker.