Raku needs better ways to deal with network addresses

Kaiepi commented 4 years ago

There are a couple issues with how network addresses are handled at the moment:

https://github.com/MoarVM/MoarVM/pull/1166 was recently merged, which makes it so the connect, bindsock, asyncconnect, asynclisten, and asyncudp ops try to use all addresses received from a DNS lookup rather than just the first. The issue with this, as @niner pointed out, is this is that all errors when connecting/binding except the one when attempting to use the final address received are silent, but people cannot change this behaviour to suit their needs.
The way IO::Socket::INET and IO::Socket::Async deal with resolving addresses is inconsistent; IO::Socket::INET supports resolving socket addresses and hostnames, while IO::Socket::Async only supports resolving hostnames.

Kaiepi commented 4 years ago

These leads me to think that there should be an op for doing DNS lookups that would take a hostname or socket address and return a list of addresses. The socket connect and bind ops could take one address at a time instead of dealing with them all at once, so the behaviour for how errors would be handled when attempting to use them would be dealt with from Rakudo, allowing people to change how they're handled if desired. Inconsistencies with how addresses are resolved would be harder to make from then on.

If this is how it should be handled though, then with a DNS op existing, why not provide a basic DNS API that people could do lookups and reverse lookups with?

jmaslak commented 4 years ago

I think we also need to think through IPv4 + IPv6. I.E. there should be a lookup that only returns IPv4 addresses if and only if the host is IPv4-only, but returns both if the host is dual-stacked (I suppose it should also only do IPv6 if the host is IPv6-only). There is also some host preference stuff that as to which address families and what order they should be in. There should be a different way to get all addresses (maybe "A" or "AAAA" records). I'd like to see us support IPv6 as a first class citizen, just like we do for Unicode.

Kaiepi commented 4 years ago

With the improvements to hostname lookup, IPv4 addresses only get returned if the interface used supports it, and ditto for IPv6 addresses. But there currently isn't a way to get both A and AAAA records regardless of whether or not the interface used supports it, which is what it was doing before (though it'd only use the first address received).

Kaiepi commented 4 years ago

The way I had planned out the work I'm doing for the IP6NS grant, there would be the same degree of support for IPv6 as there is for IPv4. This is partially implemented for IO::Socket::INET at the moment, where you can now specify whether you want the socket to use IPv4, IPv6, or either one if you don't care, but this will not be complete until socket option support is implemented. I haven't checked to see what IO::Socket::Async will need in order for it to have the same kind of support yet, since I'm not working on that until after I finish work on IO::Socket::INET.

lizmat commented 4 years ago

Perhaps some allomorph that would contain both IPv4 and IPv6 value? I guess then the question becomes: how do we decide which to use when?

vrurg commented 4 years ago

There must be a way to manually specify which address to use, no matter what type of address it is and wether it is source or destination.

Consider a corporate network to which I connect over VPN. Good if my system is smart enough to bind the connection to the VPN interface, but what if it's not?

Similarly, if a name resolves to more than one IP sometimes it is important to choose the preferred one manually.

lizmat commented 4 years ago

Perhaps a dynamic variable indicating preference?

vrurg commented 4 years ago

This would be a headache if a big system would require different binding for different connections. Why can't we have a unified address kind of type? Say, similar to perl5, Net:: family but under IO:::

my $ip = IO::Addr::Host.new(:name<google.com>).resolve; # IO::Addr::IP object
say $ip.v4; # IO::Addr::IP::v4
say $ip.v5; # IO::Addr::IP::v6
my $backresolve = $ip.resolve; # IO::Addr::Host

Respectively:

my $conn = IO::Socket::INET.new(:localhost($ip.v4), ...);

localhost is kinda documented in an example already.

Kaiepi commented 4 years ago

I think it'd be a good idea to have one unified address type, but I don't think that type is where DNS queries should be handled from, since there are other types of DNS records besides A and AAAA ones that would be useful to be able to deal with in the future that don't correspond to addresses.

How I think this could be solved sums up to this:

Add an IO::Address role done by IO::Address::IPv4, IO::Address::IPv6, and IO::Address::UNIX classes. The IPv4 and IPv6 classes would also do an IO::Address::IP role to allow typechecking for both and to define an API for features shared between them.
Add an IO::Resolver class that would handle DNS queries, like lookups and reverse lookups. It would contain a resolve method that takes a hostname, port, and optional family, type, protocol, and socket passivity parameters, returning a lazy Seq of IO::Address instances. An instance of this could live in PROCESS::<$RESOLVER>.
Add PROCESS::<&CONNECT>, which would take the list of addresses returned and a block that takes an address and binds or connects a socket with it, returning the new socket. This would be where an implementation of Happy Eyeballs would live in v6.e, and the current DNS resolution behaviour would be handled in prior versions (instead of in the backend, like it is now).

Combined, this could allow people to customize how IO::Socket::INET and IO::Socket::Async handle addresses without needing to change the classes themselves. For example, connecting to a peer synchronously could still be written like this:

my IO::Socket::INET:D $client .= connect: 'www.google.com', 443;

And addresses could be used directly for this instead, if desired:

# Connect to all addresses resolved for a hostname.
race for $*RESOLVER.resolve: 'www.google.com', 443 -> IO::Address:D $address {
    my IO::Socket::INET:D $client .= connect: $address;
    # ...
}

But if the result of using addresses directly like this is one socket and the behaviour for handling addresses is consistent for both binding and connecting, this would be better written using &*CONNECT instead:

# Replicate the current behaviour for hostname resolution.
sub CONNECT(@addresses, &callback) {
    callback @addresses.head
}
my IO::Socket::INET:D $client .= connect: 'www.google.com', 443;

niner commented 4 years ago

I love your plan!

Kaiepi commented 4 years ago

Update:

The v6.c changes are certainly doable.
M̵o̵a̵r̵V̵M̵ ̵w̵i̵l̵l̵ ̵p̵r̵o̵b̵a̵b̵l̵y̵ ̵n̵e̵e̵d̵ ̵a̵n̵ ̵a̵s̵y̵n̵c̵ ̵D̵N̵S̵ ̵l̵i̵b̵r̵a̵r̵y̵ ̵o̵f̵ ̵s̵o̵m̵e̵ ̵s̵o̵r̵t̵ ̵i̵n̵ ̵o̵r̵d̵e̵r̵ ̵t̵o̵ ̵d̵o̵ ̵t̵h̵e̵ ̵v̵6̵.̵e̵ ̵c̵h̵a̵n̵g̵e̵s̵,̵ ̵a̵s̵ ̵t̵h̵i̵s̵ ̵i̵s̵ ̵a̵ ̵r̵a̵t̵h̵e̵r̵ ̵l̵a̵r̵g̵e̵ ̵p̵r̵o̵b̵l̵e̵m̵ ̵t̵o̵ ̵b̵e̵ ̵s̵o̵l̵v̵i̵n̵g̵ ̵o̵u̵r̵s̵e̵l̵v̵e̵s̵.̵ ̵c̵-̵a̵r̵e̵s̵ ̵(̵w̵h̵i̵c̵h̵ ̵n̵o̵d̵e̵.̵j̵s̵ ̵u̵s̵e̵s̵)̵ ̵p̵r̵o̵b̵a̵b̵l̵y̵ ̵w̵o̵n̵'̵t̵ ̵d̵o̵ ̵s̵i̵n̵c̵e̵ ̵i̵t̵ ̵h̵a̵n̵d̵l̵e̵s̵ ̵H̵a̵p̵p̵y̵ ̵E̵y̵e̵b̵a̵l̵l̵s̵ ̵i̵t̵s̵e̵l̵f̵,̵ ̵w̵h̵i̵c̵h̵ ̵n̵o̵t̵ ̵a̵l̵l̵ ̵b̵a̵c̵k̵e̵n̵d̵s̵ ̵m̵a̵y̵ ̵b̵e̵ ̵a̵b̵l̵e̵ ̵t̵o̵ ̵d̵o̵.̵ ̵I̵'̵m̵ ̵t̵h̵i̵n̵k̵i̵n̵g̵ ̵o̵f̵ ̵u̵s̵i̵n̵g̵ ̵U̵n̵b̵o̵u̵n̵d̵ ̵f̵o̵r̵ ̵t̵h̵i̵s̵.̵ Unbound's way of handling async DNS doesn't play nice with the existing code in MoarVM. c-ares looks like it could work better on further inspection.
IO::Socket::INET and IO::Socket::Async keep information about socket addresses as state ($!localhost, $!localport, etc.). With the REPR for addresses that exists with this, socket and peer addresses can be accessed. I'm thinking of introducing getsockname and getpeername ops and making these attributes methods in v6.e.

Kaiepi commented 4 years ago

The v6.e changes are doable as well! I tried to use c-ares to implement them on MoarVM, but I found there were a couple of issues with it that made it less than ideal to use:

Queries cannot be cancelled one at a time. There is ares_cancel, but it cancels all queries on a DNS resolution context. Keeping a pool of contexts or creating a new context for each query is pretty expensive.
Its parsing functions don't include all information that would be useful to keep from DNS RRs. For instance, the TTL is only possible to include for A and AAAA queries using them.

For these reasons, I chose UDNS to handle DNS on MoarVM instead. This library is also much simpler to get to work properly with the I/O event loop.

While working on v6.e's changes, I found the new IO::Resolver.resolve tends to be anywhere up to twice as fast as v6.c's version, but if a system is set up to use its own DNS resolver (such as Unwind or Unbound), it winds up being much slower instead. If this is happening because IO::Resolver.resolve is doing redundant work, then there should be a way to use the old behaviour somehow (with a :native parameter, maybe).

Kaiepi commented 3 years ago

My design for how DNS resolution should be implemented has changed since my previous posts here. There is now an IO::Resolver role, which stubs a single lookup method for resolving hostnames to a supply of IO::Address::Info, similar to getaddrinfo(3)'s struct addrinfo. Helper mehods for creating these when working with the DNS protocol directly exist as well. This is done by IO::Resolver::Native, which implements v6.c's DNS resolution behaviour, and IO::Resolver::Stub, which is a stub resolver that implements Happy Eyeballs v2 for use in v6.e and later.

My strategy to implement IO::Resolver::Stub's asynchronous, cancellable DNS queries thus far has been to offload most of the work involved to libraries in backends with an nqp::asyncdnsquery op. This proved not to be ideal; each library I tested with has very different ideas of how packets should be transmitted over the network and what sort of configuration should exist for DNS resolvers. This leads to inconsistent DNS resolution behaviour across backends, and restricts what features IO::Resolver::Stub can have in unpredictable ways. In the worst case scenario, a backend developer in the future could be forced to implement DNS queries themselves, which is not a trivial task.

At the same time, since IO::Resolver::Stub works with the DNS protocol directly, it would be a shame for it not to expose an interface for making different classes and types of DNS queries outside of the IN A and IN AAAA queries we need. This is something other popular languages for networking (Node.js, Go, Java) do to some degree as well.

There is another strategy we could use, which would ease the workload on backend authors in the future, and allow us to have a much more powerful DNS resolver than any other language I've used thus far, and that is to implement a stub resolver ourselves. In order to do this, we need full TCP and UDP support for IO::Socket::Async across all backends (doable), and something along the lines of an IO::DNS module would exist for working with DNS packets. This would give us a powerful, portable DNS resolver built into the language itself, something the likes of which I've never seen before. On the other hand, by not depending on mature, well-tested code for working with the DNS, there is more risk involved when it comes to security. I mitigate this somewhat by deferring the introduction of IO::Resolver::Stub to v6.e (allowing more adventurous users to test it in the wild before v6.e becomes the default version), and by making it simple to fallback to other resolvers in the event issues with IO::Resolver::Stub arise.

Is this a good idea to be doing? If so, what sort of channels do we have for people to report security problems in the language?

niner commented 3 years ago

Just a quick thought: a full blown DNS resolver implementation in Raku is probably best developed as a module first. That's a nicer development environment anyway (avoiding the minute of compilation time). If you do so, you'll want a way to plug this implementation in, so Rakudo will use it. But then, will there even be a need to pull the implementation into the core?

Kaiepi commented 3 years ago

Which resolver IO::Socket::INET and IO::Socket::Async use is already configurable with the $*RESOLVER variable and :$resolver parameters in their methods that depend on it, so not having IO::Resolver::Stub or IO::DNS in core is doable. That would remove most of the breakage my solution introduces, and would allow me to go ahead with making a PR for the rest of the changes!

Edit: in order for this to actually fix this issue, v6.e still needs its own connection and binding logic. This can be borrowed from Happy Eyeballs v2, since that part of the algorithm stands on its own.

Raku / problem-solving

Raku needs better ways to deal with network addresses #111