OctopusDeploy / Halibut

| Public | A secure communication stack for .NET using JSON-RPC over SSL.
Other
12 stars 44 forks source link

Improve linux performance by allowing users of Halibut to use TCP NoDelay (so as to be able to opt out of Nagle’s algorithm) #610

Closed LukeButters closed 6 months ago

LukeButters commented 6 months ago

Background

Background: https://brooker.co.za/blog/2024/05/09/nagle.html

TLDR; TCP_NODELAY=true results in much faster for RPC performance in linux. TCP_NODELAY disables a TCP setting that makes sense when humans type characters into telnet (buffer the chars before sending because humans are slow) but less sense in halibut since when we send a thing we actually would like it to go over the wire now.

Turns out that setting NoDelay to true significantly reduces the time to make RPC calls on linux. In particular the test OctopusCanSendAndReceiveComplexObjects_WithMultipleDataStreams runtime drops from 23-25s to 1-5s (depending on linux environment, how the test is run).

Teamcity runs show a difference of 22.9s to 0.5s

Results

HalibutTimeoutsAndLimits supports TcpNoDelay which by default is false the old value, but when clients opt in they on linux will see performance improvements.

How to review this PR

Quality :heavy_check_mark:

Pre-requisites

nathanwoctopusdeploy commented 6 months ago

Has this added a bit of flake to integration tests, or was the PR just unlucky?

LukeButters commented 6 months ago

it may have since it changes timings! @nathanwoctopusdeploy here is the error message:

Expected sw.Elapsed to be greater than or equal to 13s, but found 877.8µs

Are we expecting something to be slow that is now no longer slow?

nathanwoctopusdeploy commented 6 months ago

it may have since it changes timings! @nathanwoctopusdeploy here is the error message:

Expected sw.Elapsed to be greater than or equal to 13s, but found 877.8µs

Are we expecting something to be slow that is now no longer slow?

This is caused by the issue @LukeButters was talking about. We don't detect control messages reliably as they are sent immediately. In the test we want to pause on the control message to ensure we timeout quickly ~10 seconds!