Open AlexWeinstein92 opened 1 year ago
Hi @AlexWeinstein92
Can you share the contents of microk8s kubectl get configmap -n kube-system coredns -o yaml
?
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
log . {
class error
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health {\n lameduck 5s\n }\n ready\n log . {\n class error\n }\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n }\n prometheus :9153\n forward . 8.8.8.8 8.8.4.4\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kube-dns"},"name":"coredns","namespace":"kube-system"}}
creationTimestamp: "2023-06-21T17:48:15Z"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: kube-dns
name: coredns
namespace: kube-system
resourceVersion: "336619"
uid: a559c098-5cd9-48df-941e-9a3dbd17cd5f
@neoaggelos I realized I had actually modified the file I gave above to include a line pod verified
at the end of Corefile{...}
block.
Taking that out has resolved the issue with the pod not starting. However I am still unable to curl one service from another's pod. Here are my service & pod definitions for the one I am trying to hit via scylla-db:9042
in case it is helpful.
apiVersion: v1
kind: Service
metadata:
name: scylla-db
spec:
selector:
app: scylla-db
clusterIP: None
ports:
- port: 9042
targetPort: 9042
---
apiVersion: v1
kind: Pod
metadata:
name: scylla-db
labels:
app: scylla-db
spec:
hostname: scylla-db
setHostnameAsFQDN: true
containers:
- image: scylladb/scylla:latest
name: scylla-db
ports:
- name: scylla-db
containerPort: 9042
hostNetwork: true
Yes, I was about to mention that the pods verified
is not in the right place. You should maybe try changing pods insecure
a few lines above to pods verified
if this is required.
Further, make sure to recreate any pods after the DNS changes, just to make sure that they do not get stale DNS replies/failures. Given that you specifically set hostNetwork: true
and setHostnameAsFQDN: true
, I would also look at the dnsPolicy
field to make sure that your pod can resolve internal hostnames.
Have a look at https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy, and maybe start with a busybox
pod to ensure that DNS resolution works as you would expect. Hope this helps!
@neoaggelos pods verified
is not required, nor are hostNetwork:true, setHostnameAsFQDN: true
. I found threads via google that suggested these may solve my issues, before digging into the state of my coredns pods.
It seems that pods verified
leads to the issue captured here https://github.com/canonical/microk8s/issues/2206. I tried using it, and got the same error messages in the coredns pod. Then I took it out, restarted everything, and now coredns logs no issues.
However even after restarting all services, deployments, and pods, I can't seem to get a curl
from one service to the other to work. I have tried going both ways - from my service pod into my db pod and the other way around. No luck so far. I have also tried a busybox pod, which runs fine, but still curl
to it from either of my pods results in could not resolve host
.
@AlexWeinstein92 can you please share more details about how you define your services and how you try to access them? Otherwise it's hard to understand what issue you are experiencing. Can you please share a steps that consistently reproduce the issue on your side? Thanks
The services are built with Akka-grpc, except for the gateway service which is built in Akka-http with stateless Actors. The GRPC services rely on event-sourced stateful actor entities for processing messages.
As it stands I can curl or use a gatling test from outside microk8s to send a request to the gateway via localhost:9042
. The requests get passed successfully to whatever service it is intended for (I am using just one endpoint in one service to test this, but it should apply to all), but the scylla-db
connection (for event sourcing) is what is causing problems right now.
Right now, I can only hit scylla-db
service (which I imported from Docker and ran instead of using scylla-operator
due to problems running the latter on my M1 mac) if I change the configuration for the akka contact point from something like
datastax-java-driver {
basic.contact-points = ["scylla-db:9042"]
}
to something that uses the InternalClusterIP (eg. 10.152.183.116 )
datastax-java-driver {
basic.contact-points = ["10.152.183.116 :9042"]
}
Which is not a robust solution for production or development, given that I have 4 services currently that rely on the same DB service, with 3 more coming soon.
To reproduce the issue you can follow these instructions for creating the docker container of scylla
. I had to pull it into multipass to deploy (see instructions at locally built images without a registry)because I had issues with registry.
Then I kubectl exec
into either the scylla
or service-level pod, and try to curl
to the other pod using service name as the URL, inside a bash
terminal. Which always seems to result in hostname not found
.
Update: If I do nslookup scylla-db
from within the busybox pod, this is the output
Server: 10.152.183.10
Address 1: 10.152.183.10 kube-dns.kube-system.svc.cluster.local
Name: scylla-db
Address 1: 10.1.254.108 10-1-254-108.scylla-db.default.svc.cluster.local
Still, if I try to curl 10-1-254-108.scylla-db.default.svc.cluster.local
(which itself is problematic as a hostname for reasons previously stated), I get error: could not resolve host
@neoaggelos any ideas here? It's important that I figure this out for my project, and I feel very stuck with it
I am experiencing the same issue on my mac with a basic nginx pod. I setup metallb and can curl the external endpoint, doing nslookup works fine and outputs the below. I can curl the internal pod IP but any attempts to curl nginx.default.svc.cluster.local
fails with curl: (6) Could not resolve host: nginx.default.svc.cluster.local
nslookup nginx.default.svc.cluster.local
Server: 10.152.183.10
Address 1: 10.152.183.10 kube-dns.kube-system.svc.cluster.local
Name: nginx.default.svc.cluster.local
Address 1: 10.152.183.237
Bumping this since it has been 2 weeks since anyone has offered any suggestions for this. I really would like to get it working - replacing clusterIPs in configuration files is a non-scalable workaround
Hi @AlexWeinstein92, unfortunately, the link you shared was about running ScyllaDB on docker. Would you mind sharing a Kubernetes YAML manifest instead? Indeed, you should not have to rely on using hardcoded service IPs
@neoaggelos Sorry if there was confusion - the Docker image is being used in the following YAML because the image Scylla provides is not compatible with my M1 machine (ie. I get errors when I try to run their image so I have to package it using Docker)
apiVersion: v1
kind: Service
metadata:
name: scylla-db
spec:
type: NodePort
selector:
app: scylla-db
ports:
- port: 9042
targetPort: 9042
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: scylla-db
labels:
app: scylla-db
spec:
selector:
matchLabels:
app: scylla-db
template:
metadata:
labels:
app: scylla-db
spec:
containers:
- name: scylla-db
image: scylladb/scylla:latest
imagePullPolicy: Never
ports:
- containerPort: 9042
hostNetwork: true
Or at least, that is the YAML file I am using now in order to have a ClusterIP. Ideally it would be more like:
apiVersion: v1
kind: Service
metadata:
name: scylla-db
spec:
selector:
app: scylla-db
clusterIP: None
---
apiVersion: v1
kind: Pod
metadata:
name: scylla-db
labels:
app: scylla-db
spec:
hostname: scylla-db
containers:
- image: scylladb/scylla:latest
imagePullPolicy: IfNotPresent
name: scylla-db
ports:
- name: scylla-db
containerPort: 9042
OK, some quick notes:
hostNetwork: true
in your pod, which basically means that the scylla-db pod will be accessible at the $VMIP:9042 as well as at the service you create. This is something you can safely remove from the pod.hostname: scylla-db
in your pod.What are the exact steps that you follow that cause the dns resolution to fail?
I've made the manifest a bit simpler, the rest should not be required. Also, scylladb/scylla:latest
worked for me just fine on an M1:
apiVersion: v1
kind: Service
metadata:
name: scylla-db
spec:
selector:
app: scylla-db
ports:
- port: 9042
targetPort: 9042
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: scylla-db
labels:
app: scylla-db
spec:
selector:
matchLabels:
app: scylla-db
template:
metadata:
labels:
app: scylla-db
spec:
containers:
- name: scylla-db
image: scylladb/scylla:latest
ports:
- containerPort: 9042
This creates a ClusterIP service that I can access at scylla-db:9042
from pods running in the cluster:
$ microk8s kubectl apply -f manifest.yaml
# wait a while
$ microk8s kubectl get pod,svc
NAME READY STATUS RESTARTS AGE
pod/scylla-db-7c4bc8d76c-h8hnz 1/1 Running 0 63s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 9h
service/scylla-db ClusterIP 10.152.183.78 <none> 9042/TCP 63s
$ microk8s kubectl run --rm -it --image alpine -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup scylla-db
Server: 10.152.183.10
Address: 10.152.183.10:53
Name: scylla-db.default.svc.cluster.local
Address: 10.152.183.78
/ # nc -z scylla-db 9042 && echo it works
it works
/ # nc -z scylla-db.default 9042 && echo it works
it works
/ # nc -z scylla-db.default.svc 9042 && echo it works
it works
/ # nc -z scylla-db.default.svc.cluster.local 9042 && echo it works
it works
If you follow the exact steps as above, which step does not do it for you?
Thanks so much for this @neoaggelos :)
This is what I am seeing:
% microk8s kubectl run --rm -it --image alpine -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup scylla-db
Server: 10.152.183.10
Address: 10.152.183.10:53
Name: scylla-db.default.svc.cluster.local
Address: 10.152.183.216
** server can't find scylla-db.svc.cluster.local: NXDOMAIN
** server can't find scylla-db.svc.cluster.local: NXDOMAIN
** server can't find scylla-db.cluster.local: NXDOMAIN
** server can't find scylla-db.cluster.local: NXDOMAIN
** server can't find scylla-db.home: NXDOMAIN
** server can't find scylla-db.home: NXDOMAIN
I am also noticing that my service image (not scylla-db) is telling me nslookup no found
when I try from a shell inside it. To be as specific as I can be, the service runs 5 Scala-written, sbt-docker-published containers, 4 of which are GRPC based and 1 of which accepts HTTP requests as a gateway to the others. nslookup not found
applies to all containers. They are all built based on eclipse-temurin and they do have ability to curl
but always give hostname not found
.
As part of the process I restarted microk8s, making sure dns is enabled, and deleted all pods, services, deployments before testing. I am also now using a YAML that looks exactly like the one you shared.
@AlexWeinstein92 OK, then it looks like the resolution works?
What about the nc -z
portion?
/ # nc -z scylla-db 9042 && echo it works
it works
/ # nc -z scylla-db.default 9042 && echo it works
it works
/ # nc -z scylla-db.default.svc 9042 && echo it works
it works
/ # nc -z scylla-db.default.svc.cluster.local 9042 && echo it works
it works
That does all seem to work from the alpine pod
That does all seem to work from the alpine pod
OK, then, what exactly is failing then? You will not be able to resolve this hostname from the host itself, or the multipass VM. Is this what is failing?
I don't know if I entirely understand your question, but the trouble is specifically that I cannot access hostname from within a container, whether that is scylla-db container or one of my scala service containers. It's strange to me that the alpine image worked because that is the kind of setup I am trying to get working, just with scala Docker containers built on eclipse-temurin
instead of alpine
@AlexWeinstein92 ok, can you then give an example workload where the DNS resolution does not work for you? If not, I am not able to reproduce your issue (especially since the service does resolve properly from a pod in the cluster).
Please share the pod that you deploy which is then unable to resolve scylla-db
, this might help to pinpoint the issue. Thanks!
@neoaggelos you can find the code I'm running here https://github.com/improving-app/back-end
There is a README on top level describing how I deploy to microk8s.
To create the docker container I simply use sbt docker:publishLocal
and then tag and push to my weinyopp
dockerhub repo. To deploy the services I use the microApply.yaml
file which can again be found at top level.
@neoaggelos wondering if you have been able to run my services, if you had any issues?
@neoaggelos Just an FYI - this issue has not been resolved (I have even tried moving to alpine images for my microservices, but they were very problematic) but I have decided to bypass it in my system by hardcoding a clusterIP
in my yaml
for the service, which I then also hardcode into my microservice DB connections configuration.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Summary
Ultimately my goal is to let my pods communicate with each other via service name. I have been told this should be easily possible with any kubernetes environment, but have not been able to figure it out with microk8s. When I
exec
into a service shell andcurl
, I consistently gethostname could not be resolved
.Right now I am also dealing with a pod startup error after doing
microk8s enable dns
. I have not editted the config file (configmap/coredns
) in any way. I am not sure if this is related to my inability to curl between pods.What Should Happen Instead?
After the
coredns
pod starts correctly, I should be able toexec
into a pod and thencurl [servicename:serviceport]
with expected answers.However, I may be overestimating the need for coredns here - please let me know if it is not necessary for for this task.
Introspection Report
inspection-report-20230622_105638.tar.gz
Environment: Mac M1