Closed jonas32 closed 3 years ago
If you can directly connect to the cluster, then that is a different issue. The rust client does not support connecting to proxies, so when aerospike.service.consul
resolves to a different actual IP each time, it is checked against the values in the service list and since it is missing, it will fail the validation. We need to update the client to support that use-case.
Now that is not really a proxy usecase. aerospike.service.consul
is just a DNS Record that directly resolves to the Aerospike IPs. Consul is just responsible for maintaining the DNS records of it. No TCP proxy or else. That are exactly the same IP addresses that work perfectly without the DNS record. As far as i see, the only problem is that the Aerospike Server Hostname is not the same as the address i use.
I didnt test it yet, but i think just commenting out the Server Name Validation function i liked above would solve the problem. Im just trying to understand why this client even makes this checks and the nodejs one doesnt
Can you run this client test and post the output, so that we can see which IPs the client is trying to connect to?
AEROSPIKE_HOSTS=aerospike.service.consul:55006 RUST_LOG=aerospike=debug cargo test --test lib connect -- --nocapture
I tried to reproduce the problem by setting up an aerospike.local
record that returns 2 IP addresses (both pointing to the same server, though). I did not see the validation error, but I don't know if my setup is a "good enough" approximation of your setup or differs in some significant way.
❯ AEROSPIKE_HOSTS=aerospike.local:55006 RUST_LOG=aerospike=debug cargo test --test lib connect -- --nocapture
Finished test [unoptimized + debuginfo] target(s) in 0.26s
Running target/debug/deps/lib-919e605389330762
running 1 test
[2021-01-08T11:50:37Z DEBUG aerospike::cluster] No connections available; seeding...
[2021-01-08T11:50:37Z INFO aerospike::cluster] Seeding the cluster. Seeds count: 1
[2021-01-08T11:50:37Z DEBUG aerospike::cluster::node_validator] Resolved aliases for host aerospike.local:55006: [Host { name: "192.168.1.10", port: 55006 }, Host { name: "192.168.1.15", port: 55006 }]
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "node\tBB9030011AC4202\ncluster-name\tmesh-test\nfeatures\tbatch-index;blob-bits;cdt-list;cdt-map;cluster-stable;float;geo;sindex-exists;peers;pipelining;pscans;relaxed-sc;replicas;replicas-all;replicas-master;replicas-max;truncate-namespace;udf"
[2021-01-08T11:50:37Z DEBUG aerospike::cluster::node_validator] Resolved aliases for host aerospike.local:55006: [Host { name: "192.168.1.10", port: 55006 }, Host { name: "192.168.1.15", port: 55006 }]
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "node\tBB9030011AC4202\ncluster-name\tmesh-test\nfeatures\tbatch-index;blob-bits;cdt-list;cdt-map;cluster-stable;float;geo;sindex-exists;peers;pipelining;pscans;relaxed-sc;replicas;replicas-all;replicas-master;replicas-max;truncate-namespace;udf"
[2021-01-08T11:50:37Z DEBUG aerospike::cluster::node_validator] Resolved aliases for host aerospike.local:55006: [Host { name: "192.168.1.10", port: 55006 }, Host { name: "192.168.1.15", port: 55006 }]
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "node\tBB9030011AC4202\ncluster-name\tmesh-test\nfeatures\tbatch-index;blob-bits;cdt-list;cdt-map;cluster-stable;float;geo;sindex-exists;peers;pipelining;pscans;relaxed-sc;replicas;replicas-all;replicas-master;replicas-max;truncate-namespace;udf"
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "node\tBB9030011AC4202\ncluster-name\tmesh-test\npartition-generation\t0\nservices\t"
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "replicas-master\ttest://////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////8="
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "node\tBB9030011AC4202\ncluster-name\tmesh-test\npartition-generation\t0\nservices\t"
[2021-01-08T11:50:37Z DEBUG aerospike::cluster] New cluster initialized and ready to be used...
[2021-01-08T11:50:37Z DEBUG aerospike::commands::info_command] response from server for info command: "node\tBB9030011AC4202\ncluster-name\tmesh-test\npartition-generation\t0\nservices\t"
test src::kv::connect ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 26 filtered out
@jonas32 Your address resolves to multiple IPs, and that was what I was talking about. We initially called that use-case proxy support internally.
@khaf ah ok. I did not know that. Makes sense.
@jhecking Is there any reason why port 55006? I get connection refused. For Port 3000 it results in:
[2021-01-08T15:55:04Z DEBUG aerospike::cluster] No connections available; seeding...
[2021-01-08T15:55:04Z INFO aerospike::cluster] Seeding the cluster. Seeds count: 1
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Resolved aliases for host aerospike.service.consul:3000: [Host { name: "10.42.193.133", port: 3000 }, Host { name: "10.42.150.198", port: 3000 }, Host { name: "10.42.92.4", port: 3000 }]
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Alias 10.42.193.133:3000 failed: Error(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Alias 10.42.150.198:3000 failed: Error(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Alias 10.42.92.4:3000 failed: Error(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
[2021-01-08T15:55:04Z ERROR aerospike::cluster] Failed to validate seed host: aerospike.service.consul:3000
[2021-01-08T15:55:04Z ERROR aerospike::cluster] Error: Connection refused (os error 111)
[2021-01-08T15:55:04Z DEBUG aerospike::cluster] No connections available; seeding...
[2021-01-08T15:55:04Z INFO aerospike::cluster] Seeding the cluster. Seeds count: 1
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Resolved aliases for host aerospike.service.consul:3000: [Host { name: "10.42.193.133", port: 3000 }, Host { name: "10.42.150.198", port: 3000 }, Host { name: "10.42.92.4", port: 3000 }]
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Alias 10.42.193.133:3000 failed: Error(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Alias 10.42.150.198:3000 failed: Error(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
[2021-01-08T15:55:04Z DEBUG aerospike::cluster::node_validator] Alias 10.42.92.4:3000 failed: Error(Io(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })
[2021-01-08T15:55:04Z ERROR aerospike::cluster] Failed to validate seed host: aerospike.service.consul:3000
[2021-01-08T15:55:04Z ERROR aerospike::cluster] Error: Connection refused (os error 111)
thread 'src::kv::connect' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Connection("Failed to connect to host(s). The network connection(s) to cluster nodes may have timed out, or the cluster may be in a state of flux."), State { next_error: None, backtrace: InternalBacktrace { backtrace: None } })', tests/common/mod.rs:43:72
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test src::kv::connect ... FAILED
The 10.42... addresses are Kubernetes Networking. The Aerospike Nodes got 10.20... addresses. That seems a little weird as a lookup from the same container on this domain results in the correct IPs. The IPs you see in the logs actually are the DNS Servers that are responsible for resolving that domain. I really dont get why it tries to connect to them. I dont think this is a problem of the Kubernetes DNS system as the node client somehow manages to get the right addresses.
Looks like i found the problem. The rust std resolves DNS names via OS calls. Node does seem to resolve directly against the DNS Server. In this case, a dig to the DNS Server results in the right result. The operating system fails it. if i ping the domain, it will ping one of the DNS servers instead. The only way to fix this on client level would be implementing a custom lookup. That seems overkill for that one usecase. I guess we can close this.
// Edit Just found out that this error also comes up when the Server rejects the Info request because of missing credentials in EE with debug mode off. It probably makes sense to change that in a future release. (Independent from the previous problem, that also happens with the right credentials)
@jhecking Is there any reason why port 55006? I get connection refused.
Sorry, my bad. Port 55006 is where Aerospike server was listening in my setup (using Docker).
Just found out that this error also comes up when the Server rejects the Info request because of missing credentials in EE with debug mode off. It probably makes sense to change that in a future release.
Yes, looks like the NodeValidator does not check the info command it sends for error response, and will return the same "Missing node name" error if auth fails. Feel free to file a separate ticket for that.
I guess we can close this.
Agree.
Hello,
I just ran into an issue with Node Name Validation. My services use consul as service discovery for the Aerospike Hosts. This results in using
aerospike.service.consul:3000
as the hosts parameter. I first did that with the nodejs client and everything worked well. The Rust client aborts with the errorThe function
validate_alias
in https://github.com/aerospike/aerospike-client-rust/blob/master/src/cluster/node_validator.rs#L61 throws that. Is there any specific reason why the Rust client does that and the nodejs client not? Is this error related to #10?