canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.46k stars 770 forks source link

microk8s - port-forward error when accessing prometheus service - Temporary failure in name resolution #2268

Closed AmilaDevops closed 1 year ago

AmilaDevops commented 3 years ago

@all I'm getting below error when I execute below command to port-forward my prometheus Cluster-IP service and when trying to browse from the external and also from inside the server when I curl.

microk8s kubectl port-forward service/prometheus-k8s -n monitoring 9090:9090 --address=0.0.0.0 &

then if I TRY to curl from the server of browse the site from external browser http://172.25.234.170:9090 'm getting below error message.

root@yqx-k8s-c1:/home/amila# **curl http://172.25.234.170:9090/** Handling connection for 9090 E0516 09:10:37.233931 1262725 portforward.go:400] an error occurred forwarding 9090 -> 9090: error forwarding port 9090 to pod 7f5fa357ddaaa37d105d562c59e06b38bcf9a9278dfc9ea0c8c7480b3804a, uid : failed to execute portforward in network namespace "/var/run/netns/cni-1b1fbae1-895635-e929-780102215148": socat command returns error: exit status 1, stderr: "2021/05/16 09:10:37 socat[630122] E getaddrinfo(\"localhost\""NULL\", {1,2,1,6}, {}): Temporary failure in name resolution\n" curl: (52) Empty reply from server

Nothing wrong in my microk8s cluster, all other node port, Loadbalance services working fine but when I do microk8s kubectl Port-forward services not working.

Below are details about my microk8s cluster details.

1.) microk8s version: installed: v1.18.18

2.) microk8s inspect: image attached generated tarball inspection-report-20210516_091653.tar.gz

3.) Also below is output of sudo snap logs -f microk8s: image

4.) See my prometheus services and pods running healthy in monitoring namespace: image

5.) See I'm using kube-dns service also healthy inside microk8s: image

Please could anyone can help me to over come this issue ? I'm now troubleshooting almost a week to get this done in my production environment.
@JimPatterson @balchua

balchua commented 3 years ago

Do you have localhost set in your /etc/host file?

AmilaDevops commented 3 years ago

yes @balchua this is how my /etc/host file looks like in the master node. (in other nodes also localhost defined) image

here yqx-k8s-c1 is my master node's hostname. 172.25.234.170 is the ip address of master node. Hope above host file configurations are correct.

AmilaDevops commented 3 years ago

but @balchua I figured out that microk8s kubelet service maybe having an issue, I don't know if its an issue yet ? is kubelet running in a pod ? if its running in a pod I can't find any kubelet pods. image

image

balchua commented 3 years ago

Its not running as a pod. Its running as systemd process. To get the logs journalctl -u snap.microk8s.daemon-kubelet -f

AmilaDevops commented 3 years ago

okay, this is the output of that image

but still my port-forward error exists. what could I do to fix that ?

balchua commented 3 years ago

Can you port forward with the address --address=172.25.234.170? I am totally guessing here. šŸ™

AmilaDevops commented 3 years ago

still the same results i'm getting @balchua šŸ™

root@yqx-k8s-c1:/home/amila# **curl http://172.25.234.170:9090/** Handling connection for 9090 E0516 09:10:37.233931 1262725 portforward.go:400] an error occurred forwarding 9090 -> 9090: error forwarding port 9090 to pod 7f5fa357ddaaa37d105d562c59e06b38bcf9a9278dfc9ea0c8c7480b3804a, uid : failed to execute portforward in network namespace "/var/run/netns/cni-1b1fbae1-895635-e929-780102215148": socat command returns error: exit status 1, stderr: "2021/05/16 09:10:37 socat[630122] E getaddrinfo(\"localhost\""NULL\", {1,2,1,6}, {}): Temporary failure in name resolution\n" curl: (52) Empty reply from server

balchua commented 3 years ago

Can u do port forwarding with the pod instead of the Service?

Another thing, check your /etc/resolv.conf for something fishy

AmilaDevops commented 3 years ago

same results sir when doing with pod also , below is the content of /etc/resolv.conf file

image

below are curl commands I executed in the server but nothing worked..... image

balchua commented 3 years ago

How about this file? /run/systemd/resolve/resolv.conf? Looks similar to this issue https://github.com/kontena/pharos-cluster/pull/450

AmilaDevops commented 3 years ago

/run/systemd/resolve/resolv.conf file looks like this, its differ from /etc/resolv.conf but I don't think its making an impact?

image

Actually what causing this issue as you think @balchua ? is it because not resolving kube dns service properly but using to resolve from my local dns instead of kube-dns ?

balchua commented 3 years ago

I am not sure. Based on the error, I don't think kube dns plays a role here. Did u get the chance to have a look at the other issue i pointed out?

balchua commented 3 years ago

Might be worth a try in your /etc/hosts instead of the localhost mapped to 127.0.0.1, can you map it to the non loopback ip? Ex the 172...

AmilaDevops commented 3 years ago

I had this entry uncommented before in hosts file @balchua but I commented it again because it was not worked either, is it good ? yqx-k8s-c1 is my host name.

image

balchua commented 3 years ago

I mean you can have that line uncommented and add localhost to it. Then comment out the line 127.... And why do you have 127.0.1.1 for the yqx-k8s-01?

AmilaDevops commented 3 years ago

@balasu do you want me to comment the first line or the second line ?

127.0.1.1 for the yqx-k8s-01 using because I already used 127.0.0.1 for localhost and I want to bind my hostname also to the loopback addresss (127.0.1.1), here both 127.0.0.1 and 127.0.1.1 are loopback addressses. It was worked before with this setup.

balchua commented 3 years ago

At this point what i want to try is to map localhost to a non loopback like 172.25.234.170 hoping that the port forwarding works.

AmilaDevops commented 3 years ago

yes sure, could you tell me with example exactly how you need it @balchua, i'm not much expertise in networking though

is it something like this you want ? correct me if i'm wrong. Thanks

image

balchua commented 3 years ago

Im no network guru too. šŸ˜€. Yes something like that.

AmilaDevops commented 3 years ago

I like experiments, no prob with that :-)

@balchua but I have a concern because even when there are just localhost in /etc/hosts file, still in the server when I execute curl command to check wether my site working still its not. Gives the same port-forward error. see below:

image

image

whats happening here guys ?

balchua commented 3 years ago

Hi @AmilaDevops i dont really know what going on here. I also use the same port-forward on my dev and has not seen such error before. Im going to try this one. Go to a different machine which is not part of the cluster. From the main node, Get the kubeconfig from microk8s config > /tmp/client.conf Copy this client.conf to another machine and do port forwarding from there. Ex. On a machine test (not the kube node)

balchua commented 3 years ago

@AmilaDevops i just tried v1.18.8 i can do port-forwarding on a single node microk8s.

microk8s kubectl -n monitoring port-forward svc/prometheus-k8s 9999:9090
Forwarding from 127.0.0.1:9999 -> 9090
Forwarding from [::1]:9999 -> 9090
Handling connection for 9999
Handling connection for 9999
Handling connection for 9999
Handling connection for 9999
Handling connection for 9999
Handling connection for 9999

From another terminal

curl http://localhost:9999/graph
<!DOCTYPE html>
<html lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <meta name="robots" content="noindex,nofollow">
        <title>Prometheus Time Series Collection and Processing Server</title>
        <link rel="shortcut icon" href="/static/img/favicon.ico?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">
        <script src="/static/vendor/js/jquery-3.3.1.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>    
        <script src="/static/vendor/js/popper.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
        <script src="/static/vendor/bootstrap-4.3.1/js/bootstrap.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>

        <link type="text/css" rel="stylesheet" href="/static/vendor/bootstrap-4.3.1/css/bootstrap.min.css?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">
        <link type="text/css" rel="stylesheet" href="/static/css/prometheus.css?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">
        <link type="text/css" rel="stylesheet" href="/static/vendor/bootstrap4-glyphicons/css/bootstrap-glyphicons.min.css?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">

        <script>
            var PATH_PREFIX = "";
            var BUILD_VERSION = "4ef66003d9855ed2b7a41e987b33828ec36db34d";
            $(function () {
                $('[data-toggle="tooltip"]').tooltip()
            })
        </script>

    <link type="text/css" rel="stylesheet" href="/static/vendor/rickshaw/rickshaw.min.css?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">
    <link type="text/css" rel="stylesheet" href="/static/vendor/eonasdan-bootstrap-datetimepicker/bootstrap-datetimepicker.min.css?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">

    <script src="/static/vendor/rickshaw/vendor/d3.v3.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/rickshaw/vendor/d3.layout.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/rickshaw/rickshaw.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/moment/moment.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/moment/moment-timezone-with-data.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/eonasdan-bootstrap-datetimepicker/bootstrap-datetimepicker.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/bootstrap3-typeahead/bootstrap3-typeahead.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/fuzzy/fuzzy.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>

    <script src="/static/vendor/mustache/mustache.min.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>
    <script src="/static/vendor/js/jquery.selection.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>

    <script src="/static/js/graph/index.js?v=4ef66003d9855ed2b7a41e987b33828ec36db34d"></script>

    <script id="graph_template" type="text/x-handlebars-template"></script>

    <link type="text/css" rel="stylesheet" href="/static/css/graph.css?v=4ef66003d9855ed2b7a41e987b33828ec36db34d">

    </head>

    <body>
        <nav class="navbar fixed-top navbar-expand-sm navbar-dark bg-dark">
            <div class="container-fluid">      

                <button type="button" class="navbar-toggler" data-toggle="collapse" data-target="#nav-content" aria-expanded="false" aria-controls="nav-content" aria-label="Toggle navigation">
                    <span class="navbar-toggler-icon"></span>

                </button>

                <a class="navbar-brand" href="/">Prometheus</a>

                <div id="nav-content" class="navbar-collapse collapse">
                    <ul class="navbar-nav">

                        <li class="nav-item"><a class="nav-link" href="/alerts">Alerts</a></li>
                        <li class="nav-item"><a class="nav-link" href="/graph">Graph</a></li>
                        <li class="nav-item dropdown">
                            <a href="#" class="nav-link dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Status <span class="caret"></span></a>
                            <div class="dropdown-menu">
                                <a class="dropdown-item" href="/status">Runtime &amp; Build Information</a>
                                <a class="dropdown-item" href="/flags">Command-Line Flags</a>
                                <a class="dropdown-item" href="/config">Configuration</a>
                                <a class="dropdown-item" href="/rules">Rules</a>
                                <a class="dropdown-item" href="/targets">Targets</a>
                                <a class="dropdown-item" href="/service-discovery">Service Discovery</a>
                            </div>
                        </li>
                        <li class= "nav-item" >
                            <a class ="nav-link" href="https://prometheus.io/docs/prometheus/latest/getting_started/" target="_blank">Help</a>
                        </li>
                    </ul>
                </div>
            </div>
        </nav>

    <div id="graph_container" class="container-fluid">
        <div class="query-history">
            <i class="glyphicon glyphicon-unchecked"></i>
            <button type="button" class="search-history" title="search previous queries">Enable query history</button>
        </div>
    </div>

    <div class="container-fluid">
      <div><input class="btn btn-primary" type="submit" value="Add Graph" id="add_graph"></div>
    </div>

    </body>
</html>

From the browser image

I dont get the error you are facing. :disappointed:

My /etc/hosts

cat /etc/hosts
127.0.0.1   localhost 
127.0.1.1   norse.localdomain   norse

My /etc/resolv.conf

cat /etc/resolv.conf 
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad

monitoring namespace

microk8s kubectl -n monitoring get all
NAME                                      READY   STATUS    RESTARTS   AGE
pod/alertmanager-main-0                   2/2     Running   0          10m
pod/grafana-fbb6785d5-f8msz               1/1     Running   0          10m
pod/kube-state-metrics-dcc94d9f8-qt6h4    3/3     Running   0          10m
pod/node-exporter-fqvfs                   2/2     Running   0          10m
pod/prometheus-adapter-5949969998-mnw2p   1/1     Running   0          10m
pod/prometheus-k8s-0                      3/3     Running   1          10m
pod/prometheus-operator-5c7dcf954-clvkr   1/1     Running   0          10m

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-main       ClusterIP   10.152.183.11    <none>        9093/TCP                     10m
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   10m
service/grafana                 ClusterIP   10.152.183.179   <none>        3000/TCP                     10m
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            10m
service/node-exporter           ClusterIP   None             <none>        9100/TCP                     10m
service/prometheus-adapter      ClusterIP   10.152.183.138   <none>        443/TCP                      10m
service/prometheus-k8s          ClusterIP   10.152.183.119   <none>        9090/TCP                     10m
service/prometheus-operated     ClusterIP   None             <none>        9090/TCP                     10m
service/prometheus-operator     ClusterIP   None             <none>        8080/TCP                     10m

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   1         1         1       1            1           kubernetes.io/os=linux   10m

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana               1/1     1            1           10m
deployment.apps/kube-state-metrics    1/1     1            1           10m
deployment.apps/prometheus-adapter    1/1     1            1           10m
deployment.apps/prometheus-operator   1/1     1            1           10m

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-fbb6785d5               1         1         1       10m
replicaset.apps/kube-state-metrics-dcc94d9f8    1         1         1       10m
replicaset.apps/prometheus-adapter-5949969998   1         1         1       10m
replicaset.apps/prometheus-operator-5c7dcf954   1         1         1       10m

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   1/1     10m
statefulset.apps/prometheus-k8s      1/1     10m

I can't find where it is failing.

One thing i didnt try to setup a multi node.

AmilaDevops commented 3 years ago

its working in any other server in the world mate, but I have specific this error and its blocked my life :P

I check pod logs, dns pods logs, service logs :etc can't find anything and I'm sure that the tcp traffic not reach to the prometheus-k8s pods or the container. So something fancy going in between my localhost network interface and microk8s network interface ? (I'm using network plugin as flannel as a service)

Also I find out (ohh gosh) my etcd pod not running ? image

thanks for your help @balchua I will probably do a microk8s restart tomorrow and see. I don't think any way etcd not running effect this error ?

balchua commented 3 years ago

Hi @AmilaDevops etcd is not running as a pod, its running as systemd. And without etcd, the apiserver and flannel will not run. So i will rule out that one. So in your other system which works, maybe you can find some difference, like firewall, or iptables.

You can compare both, by running inspect on the working and compare it with the inspect tarball logs in the one which is not working. I guess you can also nuke the one not working by doing snap remove microk8s --purge then reinstall to start a clean slate.

balchua commented 3 years ago

Hi @AmilaDevops just checking if you are still experiencing the problem. Thanks

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.