croservices / cro-core

The heart of the Cro library for building distributed systems in Raku, including pipeline composition and TCP support.
https://cro.services/
Artistic License 2.0
27 stars 9 forks source link

Workaround to allow setting TCP_NODELAY #19

Closed japhb closed 3 years ago

japhb commented 4 years ago

This is the first part (and also the most likely to need iteration) of the fixes for https://github.com/croservices/cro/issues/129 .

Until https://github.com/MoarVM/MoarVM/issues/1229 is fixed and the functionality is available directly in rakudo-moar, we need to use a workaround to allow setting TCP_NODELAY using NativeCall instead.

This commit introduces Cro::TCP::NoDelay, which exports a sub that enables setting this on any socket that supports .native-descriptor (which both TCP and TLS async sockets do) on operating systems that support the setsockopt call.

There are two portability concerns with this commit:

  1. This commit assumes TCP_NODELAY == 1 and int is 32 bits. I'm not sure how much of a problem this actually is.
  2. There is no fallback for operating systems (or Raku VMs) that don't support calling setsockopt via NativeCall.

Help hereby requested to resolve issue 2, and determine whether issue 1 is a real problem for us.

japhb commented 3 years ago

Looks right enough overall.

OK, good.

I do think it could be good if we could add tests that use the nodelay flag, so that if there are problems no specific platforms, we'll get a failing test, not a problem much later when this is used.

Fair enough, I'll work on that during my next hack window. :-)

japhb commented 3 years ago

OK, looks like I finally have a fully clean Travis with the new tests (and the new list of Rakudo versions to test).

The only thing still bothering me can be seen with the commit message on ba86b31eb03676c71ddc6d3125658351d9a8f11b -- there is a very short sleep needed after Cro::TCP::Connector.establish before sending any messages through it to prevent a deadlock upon sending the first message. This might be a core bug (not caused by my changes but simply exposed by the new tests); details in that commit message.

jnthn commented 3 years ago

The only thing still bothering me can be seen with the commit message on ba86b31 -- there is a very short sleep needed after Cro::TCP::Connector.establish before sending any messages through it to prevent a deadlock upon sending the first message. This might be a core bug (not caused by my changes but simply exposed by the new tests); details in that commit message.

There's a race in the test. Establishing the connection happens asynchronously: the process is started at the point we obtain the Channel of responses, and somewhere along the line it taps the $send.Supply. $send is a Supplier; messages are broadcast to all zero or more subscribers, and when the timing is off, there are zero. The neatest fix for this test is to switch to Supplier::Preserving (I've tried, and it helps).

japhb commented 3 years ago

For historical value:

Additional discussion regarding the API occurred in #cro. In short, establish is intended as a helper for the common case in which the user code does not need to block on (or otherwise detect) the completion of the connection. If the user code does need to detect this, it should instead use the connect method directly.

japhb commented 3 years ago

Huh, that's the first time I've come across this situation -- you've approved, but not merged. What happens next? Is there something else I need to do?

jnthn commented 3 years ago

@japhb At the time I approved it, the Travis CI verdict still wasn't in; I was just waiting on that, in the unlikely event it found something.

japhb commented 3 years ago

AH! OK, I understand now. Thanks, jnthn! :-)