Closed evenbrenden closed 3 years ago
Thanks @evenbrenden - this is exactly why we made 1.4.21-beta1 a beta. Looks to me like our new address parser must not be handling this case correctly. We'll get it fixed before 1.4.21 goes stable.
Actually... This doesn't look like a bug with Akka.NET:
akka.cluster.seed-nodes = [ "akka.tcp://Cluster@host1.domain.com:1234", "akka.tcp://Oddjob@Cluster@host2.domain.com:1234" ]
Oddjob@Cluster@
is not a valid Uri. I've added a reproduction for parsing FQDNs in v1.4.21-beta1 and they work fine, no problem. Our IPV4 address parsing would barf too if we couldn't support them.
Can you double check your HOCON and make sure that everything is in order there?
@Aaronontheweb I believe @evenbrenden made a type when writing the address in the issue. Here's what's changed in our seed-nodes configuration:
BEFORE: seed-nodes = [ "akka.tcp://Oddjob@maodatest01:1963", "akka.tcp://Oddjob@maodatest02:1963" ]
AFTER: seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
@evenbrenden can you double-check the issue and correct the addresses if necessary.
ok, let me give that a shot in my reproduction.
Worth noting: the Address.Parse
code we use to parse in these values for Akka.Cluster wasn't actually changed in v1.4.21. That still uses Uri.TryParse
under the covers and isn't used in the Akka.Remote deserialization pipeline.
Yep, that line works just fine in my reproduction spec also.
@object @evenbrenden do you have some logs to go along with this error message? It's possible that the issue here can be a DNS reachability problem - in which case you might want to try:
akka.remote.dot-netty.tcp {
port = 1234
hostname = "0.0.0.0"
public-hostname = "maodatest02.felles.ds.nrk.no"
}
Can you also check if maodatest02.felles.ds.nrk.no
and maodatest02
resolve to the same IP address on:
That can be another fun source of trouble for DNS issues sometimes. I've seen situations where the resolution is different depending on where it's performed (last time it was the result of a Kubernetes DNS caching error.)
Can you double check your HOCON and make sure that everything is in order there?
Was trying to make a concise example, but it backfired :) Yes, that is a typo.
akka {
remote.dot-netty.tcp {
port = 1963
hostname = "0.0.0.0"
public-hostname = "maodatest01.felles.ds.nrk.no"
}
cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}
This...partly works, most nodes are struggling. Can't be sure why, some associations followed by disassociations in the logs. But this certainly works:
akka {
remote.dot-netty.tcp {
port = 1963
hostname = "maodatest01.felles.ds.nrk.no"
}
cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}
This does not work:
akka {
remote.dot-netty.tcp {
port = 1963
hostname = "maodatest01"
}
cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}
And neither does this:
akka {
remote.dot-netty.tcp {
port = 1963
hostname = "" # defaults to machine hostname
}
cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}
I believe - but can't really confirm - that the machine defaults to a hostname that is not fully qualified. Could that cause the nodes to be unreachable, even if the machines can actually reach each other without FQDNs?
Can you also check if
maodatest02.felles.ds.nrk.no
andmaodatest02
resolve to the same IP address on:* The machine binding to it * The machine connecting to it
Can confirm that the IPs are the same for these two, in both directions.
I'll try to gather some logs, but at a first glance, there are AssociationError
s everywhere as expected.
Ah ok, I see what's going on here:
akka {
remote.dot-netty.tcp {
port = 1963
hostname = "maodatest01"
}
cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}
This won't work because Akka.Remote has no idea that maodatest01.felles.ds.nrk.no
and maodatest01
are the same address, thus you should see some "dropping address for non-local recipient" error messages come from Akka.Remote.
akka {
remote.dot-netty.tcp {
port = 1963
hostname = "maodatest01.felles.ds.nrk.no"
}
cluster.seed-nodes = [ "akka.tcp://Oddjob@maodatest01.felles.ds.nrk.no:1963", "akka.tcp://Oddjob@maodatest02.felles.ds.nrk.no:1963" ]
}
This is the right way to do it - Akka.Remote knows that the sender and the recipient are supposed to be the same in this scenario.
So unless we can get the hostname = ""
configuration to default to exactly what is listed as host in the seed-nodes
, we'll need an explicit, separate hostname
configuration per node, right? For hostname = ""
, what does Akka.NET call underneath the hood the get the machine host? I guess that might work on some machines/interfaces, others not.
Depends on the transport, but Akka.NET will call Dns.GetHostName()
and pick the first item off the top of the list usually if that value is not set. But yes, Akka.NET wants all of the hostnames to match 1:1.
@Aaronontheweb so the only way to use FQDN is to explicitly set hostname (or public_hostname) in Akka HOCON. Which requires setting these values during deployment, something that would be great to avoid. Is it possible to make a convention that will hint Akka to set FQDN value to a hostname, for example:
In addition to
akka.remote.dot-netty.tcp {
port = 1234
hostname = "0.0.0.0"
public-hostname = "maodatest02.felles.ds.nrk.no"
}
also support
akka.remote.dot-netty.tcp {
port = 1234
hostname = "0.0.0.0"
public-hostname = "" # defaults to fully qualified machine domain name
}
I.e. if hostname is set zeros and public-hostname is set to an empty string, Akka will concatenate a DNC name with domain name. If this sounds reasonable, we can make a PR.
What do you think, Aaron?
But I see that self Akka.NET doesn't call DNS. GetHostName. This must be happening inside Dotnetty which is abandoned (sigh). So perhaps this is more complicated than I first thought.
@object just an idea, but you could read an environment variable with the correct FQDN and programmatically override the public-hostname
during config loading.
@ismaelhamed Yes, that's probably the easiest in our case. Thanks for the tip.
@object just an idea, but you could read an environment variable with the correct FQDN and programmatically override the
public-hostname
during config loading.
This is what we do ourselves.
Edit: not just for FQDNs - any hostname binding when we're running in production environments.
Actually we are doing the same already. So it's just to modify our code slightly.
Then IMHO it will be difficult (and probably unreasonable) to intercept DotNetty settings, we should do it in the code. Should we close the issue @evenbrenden ?
I agree, as long as we need something else than a static config, we might as well handle it application-side.
Thanks @Aaronontheweb @object @ismaelhamed!
To mix hostname and FQDN is already a multihomed setup related Discussion: https://github.com/akkadotnet/akka.net/discussions/4993
Version Information Version of Akka.NET?
1.4.21-beta1
Which Akka.NET Modules?
Describe the bug
As part of a host migration we need to use fully qualified domain names for our seed nodes, i.e. changing from
to
host1:1234
andhost2:1234
are our Lighthouse services, and are both configured withIn our case,
akka.remote.dot-netty.tcp.hostname
defaults tohost1
andhost2
for their respective nodes, i.e. not FQDNs. With this configuration, the seed nodes are not found and the cluster does not get up. Setting thehostname
s to FQDNshost1.domain.com
andhost2.domain.com
fixes this, but leaves us with a hardcoded set of hostnames and a separate configuration per node.Expected behavior
My question is if this is expected behaviour (given a host that does not provide an FQDN for itself) and/or whether there exists a way to continue using hostname-agnostic configurations for
akka.remote.dot-netty.tcp
.Actual behavior
An unreachable cluster. I am assuming that this also applies to any remote non-seed nodes too, i.e. a node on port 2345 is unreachable if hostname is not an FQDN.
To Reproduce
Unfortunately I do not have minimal example, but the changes described above should be sufficient to reproduce.
Environment