Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Raku lacks a generic way of handling backend constants (signals, protocol families, etc.) #243

Open Kaiepi opened 3 years ago

Kaiepi commented 3 years ago

Before I can make a PR for my solution for #111, there's another problem that needs to be solved: ProtocolFamily, SocketType, and ProtocolType don't correspond to real values the backend uses with sockets. This is due to a change I made early on in my work on the IP6NS grant. At the time, they corresponded to Linux's values for PF_*/SOCK_*/IPPROTO_*, but these aren't always the same as those used by other platforms, such as FreeBSD. Allowing the JVM to expose values for these that could be used by NativeCall wasn't something I could figure out how to do at the time, since I hadn't worked with Java much, and there are no values like these exposed for use with NIO. While making them correspond to nqp constants instead allows sockets to be used on more platforms, this hides the problem more than it solves anything, and makes them become an issue again once an API for DNS resolvers gets involved. IPPROTO_* values are unlike PF_* or SOCK_* ones in that they're typically numbers assigned by IANA, in which case they will not differ from platform to platform when the protocols they correspond to are supported. These numbers can appear in some types of DNS responses (WKS, for instance), so should people attempt to use ProtocolType to represent them, they're in for a rude surprise.

There is a lot of code involved in exposing signals from the backend, which is rather inefficient in the JVM in particular. This doesn't translate or scale well for protocol families, socket types, and protocol types, as well as any other type of value like these that could exist in the future, such as socket options. I think there's a more general problem to be solved here:

Kaiepi commented 3 years ago

I have a solution I'd like to propose for this, which comes with some good news and some bad news. The good news is it solves this problem in a way that improves the performance of both signals and sockets on MoarVM with minimal breakage, with &signal becoming around 25% faster and IO::Socket::INET.listen becoming around 15% faster; the bad news can't be given without explaining how it works first.

So far, I've been calling the values this issue pertains to "constants". This term is already overloaded and already carries a meaning in a backend context, so I call magical constants like signals "runes" instead.

On the Rakudo side of things, a common API for runes can be defined with the following types:

my enum Rune::Kind ( #`[...] );

my enum Rune::Support ( #`[...] );

my role Rune[Rune::Kind:D] {
    method kind(::?CLASS:_: --> Rune::Kind:D) { ... }

    method support(::?CLASS:_: --> Rune::Support:D) { ... }
}

my role Rune::WithDefault[Int:D] {
    method default(::?CLASS:_: --> Int:D) { ... }

    multi method CALL-ME(::?CLASS:U: Int()) { ... }
}

A rune has a kind and a level of support associated with it, alongside the key, value, and index they have as a result of being defined as enums. Kinds differentiate between different lists of runes in the backend, and are relevant when generating rune enums or getting a level of support for an individual rune. Separating the level of support for a rune from its value (like signals do now) makes it possible to eliminate Rakudo::Internals.VM-SIGNALS, since there's no longer a need to ask the backend for a list of runes more than once to get their support levels.

Runes may or may not always have a defined value when the host doesn't support them (e.g. 0 for Signal). When they do, Rune::WithDefault allows a default, out-of-range value for these to be provided, which its CALL-ME candidate invalidates. Signal::Signally no longer has any behaviour unique to it because of this type, which can now be shared with ProtocolFamily and SocketType without introducing more types.

nqp gains two new ops:

nqp::getrunes(int $kind)
nqp::getrunesupport(int $kind, int $idx --> int)

getrunes returns a list of key/value pairs corresponding to a $kind of rune (corresponding to a Rune::Kind:D), similarly to getsignals.

getrunesupport gets the level of support for an individual rune (corresponding to a Rune::Support:D) given its kind and index.

nqp protocol for runes is based around their "canonical indices" rather than their real values, which makes it possible for the backend to continue to validate runes given as arguments to ops in constant time. The signal, connect, and bindsock ops now accept indices of runes instead of values, like getrunesupport does.

On MoarVM, besides using rune indices instead of values, current performance levels of signal and socket ops can be maintained by doing more of the work involved in generating lists of runes during compile-time. All lists of runes can be generated like this, but it's not guaranteed to be possible for some to have a predefined order that can be known at this point in time. In this case, when accessed for the first time, a list of runes is canonicalized with a mergesort by value. The result is cached in the VM instance alongside its boxing for use during lookups and to allow getrunes to be called more than once for a kind of rune more efficiently (should there come a time when that becomes necessary).

The bad news is to do with backends other than MoarVM. A similar strategy to how I implemented this API for MoarVM could be used for the JVM and JS backends, but in order for them to be capable of obtaining real values for socket-related runes, the JVM backend would need a C compiler, and the JS backend would need a C++ compiler on hand when built. In theory, a C or C++ preprocessor would be enough to obtain rune values, but in practice I found this to be fragile, and builds would break as soon as it gets used with values that aren't C literals anyway. If this is the way to go, I don't think I have the insight needed to teach the build system to work with C/C++ compilers to complete implementations of this for the JVM and JS backends (at least, not when time's a concern).