crate / crate-operator

The CrateDB Kubernetes Operator provides a convenient way to run CrateDB clusters inside Kubernetes.
https://crate.io
Apache License 2.0
24 stars 7 forks source link

Kubernetes Crash pod - Failed to resolve '_cluster._tcp.crate-discovery-my-cluster.default.svc.cluster.local.' #666

Closed Nyuuk closed 3 weeks ago

Nyuuk commented 1 month ago

I have deployed the crate operator using Helm following the documentation https://cratedb.com/docs/guide/install/container/kubernetes/kubernetes-operator.html and I have run dev-cluster.yaml. but I found Failed to resolve service discrovery-cluster this might cause a crash here is the error log name pod: crate-data-hot-my-cluster-0

[2024-10-14T09:06:57,426][ERROR][i.c.d.SrvUnicastHostsProvider] [data-hot-0] DNS lookup exception:
java.util.concurrent.ExecutionException: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-my-cluster.default.svc.cluster.local.' [SRV(33)]
        at io.netty.util.concurrent.DefaultPromise.get(DefaultPromise.java:374) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.crate.discovery.SrvUnicastHostsProvider.lookupRecords(SrvUnicastHostsProvider.java:165) ~[dns-discovery-5.8.1.jar:?]
        at io.crate.discovery.SrvUnicastHostsProvider.getSeedAddresses(SrvUnicastHostsProvider.java:140) ~[dns-discovery-5.8.1.jar:?]
        at org.elasticsearch.discovery.DiscoveryModule.lambda$new$4(DiscoveryModule.java:139) ~[crate-server-5.8.1.jar:?]
        at org.elasticsearch.discovery.SeedHostsResolver$1.doRun(SeedHostsResolver.java:191) ~[crate-server-5.8.1.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[crate-server-5.8.1.jar:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1570) [?:?]
Caused by: java.net.UnknownHostException: Failed to resolve '_cluster._tcp.crate-discovery-my-cluster.default.svc.cluster.local.' [SRV(33)]
        at io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:1151) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1098) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.query(DnsResolveContext.java:457) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.tryToFinishResolve(DnsResolveContext.java:1056) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.onResponse(DnsResolveContext.java:692) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext.access$500(DnsResolveContext.java:69) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsResolveContext$2.operationComplete(DnsResolveContext.java:515) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:583) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:559) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsQueryContext.trySuccess(DnsQueryContext.java:345) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsQueryContext.finishSuccess(DnsQueryContext.java:336) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.resolver.dns.DnsNameResolver$DnsResponseHandler.channelRead(DnsNameResolver.java:1384) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:97) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.111.Final.jar:4.1.111.Final]
        ... 1 more
Caused by: io.netty.resolver.dns.DnsErrorCauseException: Query failed with NXDOMAIN
        at io.netty.resolver.dns.DnsResolveContext.onResponse(..)(Unknown Source) ~[netty-resolver-dns-4.1.111.Final.jar:4.1.111.Final]
[2024-10-14T09:06:58,061][INFO ][o.e.c.s.MasterService    ] [data-hot-0] elected-as-master ([1] nodes joined)[{data-hot-0}{t6_1jL7rTrCgQV16g5FuvA}{14jrfutqQ961f4qC9AOSAQ}{10.244.4.28}{10.244.4.28:4300}{dm}{node_name=hot, http_address=10.244.4.28:4200} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 153, version: 177, reason: master node changed {previous [], current [{data-hot-0}{t6_1jL7rTrCgQV16g5FuvA}{14jrfutqQ961f4qC9AOSAQ}{10.244.4.28}{10.244.4.28:4300}{dm}{node_name=hot, http_address=10.244.4.28:4200}]}
[2024-10-14T09:07:00,398][INFO ][o.e.c.s.ClusterApplierService] [data-hot-0] master node changed {previous [], current [{data-hot-0}{t6_1jL7rTrCgQV16g5FuvA}{14jrfutqQ961f4qC9AOSAQ}{10.244.4.28}{10.244.4.28:4300}{dm}{node_name=hot, http_address=10.244.4.28:4200}]}, term: 153, version: 177, reason: Publication{term=153, version=177}
WalBeh commented 1 month ago

@Nyuuk thank you for sharing the logs with us and using crate-operator!

The DNS errors itself - while the crateDB node/POD is started - are IMHO not an issue for crateDB itself.

crate-operator creates a discovery service (which is a kubernetes service). This service is a headless service, which creates the necessary DNS Services (in your k8s DNS), as soon as the crateDB Pod are set to READY and the PODs IPs are added to the corresponding endpoints.

I hope that makes sense and addresses your question. Otherwise let us know (and post more surrounding logs).