eliu / openshift-vagrant

Bring up a real OKD cluster on your local machine using Vagrant and VirtualBox
Apache License 2.0
57 stars 56 forks source link

When trying to deploy an image, getting connection refused to Kubernete API endpoint #14

Open pradeepn-altran opened 5 years ago

pradeepn-altran commented 5 years ago

Tried creating a project and deploy an image. I get the following error.

error: couldn't get deployment websphere-liberty-1: Get https://172.30.0.1:443/api/v1/namespaces/was-liberty/replicationcontrollers/websphere-liberty-1:  dial tcp 172.30.0.1:443: connect: connection refused

I tried to do a curl https://172.30.0.1:443/ It seem to work on the master node

[vagrant@master ~]$ curl https://172.30.0.1
{
  "paths": [
    "/api",
    "/api/v1",
    "/apis",
    "/apis/",
    "/apis/admissionregistration.k8s.io",
    "/apis/admissionregistration.k8s.io/v1beta1",
    "/apis/apiextensions.k8s.io",
    "/apis/apiextensions.k8s.io/v1beta1",
    "/apis/apiregistration.k8s.io",
    "/apis/apiregistration.k8s.io/v1",
    "/apis/apiregistration.k8s.io/v1beta1",
    "/apis/apps",
    "/apis/apps.openshift.io",
    "/apis/apps.openshift.io/v1",
    "/apis/apps/v1",
``

But when I tried to issue this from the node01 or node 02 It cannot connect

[vagrant@node01 ~]$ curl https://172.30.0.1 curl: (7) Failed connect to 172.30.0.1:443; Connection refused [vagrant@node01 ~]$



Any pointers highly appreciated.  
pradeepn-altran commented 5 years ago

Running oc get services from master lists the following:

[vagrant@master ~]$ sudo  oc get services --all-namespaces
NAMESPACE               NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
default                 docker-registry               ClusterIP   172.30.241.100   <none>        5000/TCP                  26d
default                 kubernetes                    ClusterIP   172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     26d
default                 registry-console              ClusterIP   172.30.176.239   <none>        9000/TCP                  26d
default                 router                        ClusterIP   172.30.94.134    <none>        80/TCP,443/TCP,1936/TCP   26d
kube-system             kube-controllers              ClusterIP   None             <none>        8444/TCP                  25d
kube-system             kubelet                       ClusterIP   None             <none>        10250/TCP                 26d
openshift-console       console                       ClusterIP   172.30.130.59    <none>        443/TCP                   26d
openshift-monitoring    alertmanager-main             ClusterIP   172.30.155.143   <none>        9094/TCP                  25d
openshift-monitoring    alertmanager-operated         ClusterIP   None             <none>        9093/TCP,6783/TCP         25d
openshift-monitoring    cluster-monitoring-operator   ClusterIP   None             <none>        8080/TCP                  26d
openshift-monitoring    grafana                       ClusterIP   172.30.215.14    <none>        3000/TCP                  26d
openshift-monitoring    kube-state-metrics            ClusterIP   None             <none>        8443/TCP,9443/TCP         25d
openshift-monitoring    node-exporter                 ClusterIP   None             <none>        9100/TCP                  25d
openshift-monitoring    prometheus-k8s                ClusterIP   172.30.102.50    <none>        9091/TCP                  25d
openshift-monitoring    prometheus-operated           ClusterIP   None             <none>        9090/TCP                  25d
openshift-monitoring    prometheus-operator           ClusterIP   None             <none>        8080/TCP                  26d
openshift-web-console   webconsole                    ClusterIP   172.30.103.229   <none>        443/TCP                   26d
was-liberty             websphere-liberty             ClusterIP   172.30.36.201    <none>        9080/TCP,9443/TCP         2h
jorge-romero commented 5 years ago

I have the same problem, I don't know if you have find a workaround or a fix for this problem.

Currently using a W10 host, and the cluster configuration worked without any problem

pradeepn-altran commented 5 years ago

Not yet. It probably has something to do with access between the pods and the Kubernetes API Server.

Do you mean Windows 10 host when you say W10 ? Are you saying it is working there? I am also using Windows 10 Host for my Virtual Box VMs.

jorge-romero commented 5 years ago

Hi!, I'm using windows 10 but unfortunatelly it is not working there

nmr commented 4 years ago

I suppose the problem is caused by incorrect network configuration due to the specificity of vagrant and virtualbox. Vagrant creates NAT network with address 10.0.2.15 on ETH0 and ansible template uses eth0 address to determine 'masterIP' variable in '/etc/origin/master/master-config.yaml'

[root@master ~]# grep 10.0.2.15 /etc/origin/master/master-config.yaml masterIP: 10.0.2.15

If I understand correctly, SDN uses this address when configuring iptables on node. Both the master machine and node01 have the same IP address 10.0.2.15 on eth0, so when you try connect to 172.30.0.1 iptables on node01 directs traffic to itself.

Have you tried to enter the correct 'masterIP' address in the file and restart the server?

eliu commented 4 years ago

@nmr this network interface selecting issue has already been handled properly long time ago. right now okd is forced to choose the right network by explicitly setting the IP address on /etc/ansible/hosts

nmr commented 4 years ago

@eliu I'm not sure. Today I've created this environment with your configuration files and I think that this openshift network doesn't work correctly. In this configuration file '/etc/origin/master/master-config.yaml' I see 10.0.2.15 adddress.

Master Node:

[root@master master]# ip a | grep -A2 -E "eth(0|1):"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
--
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:12:57:30 brd ff:ff:ff:ff:ff:ff
    inet 192.168.160.101/24 brd 192.168.160.255 scope global noprefixroute eth1

[root@master master]# grep -R 10.0.2.15 /etc/origin/master/*
/etc/origin/master/master-config.yaml:  masterIP: 10.0.2.15

[root@master master]# iptables -L -t nat -v | grep 10.0.2.15
    0     0 DNAT       tcp  --  any    any     anywhere             anywhere             /* default/kubernetes:https */ tcp to:10.0.2.15:8443
    0     0 DNAT       udp  --  any    any     anywhere             anywhere             /* default/kubernetes:dns */ udp to:10.0.2.15:8053
    0     0 DNAT       tcp  --  any    any     anywhere             anywhere             /* default/kubernetes:dns-tcp */ tcp to:10.0.2.15:8053

Node01:

[root@node01 ~]# ip a | grep -A2 -E "eth(0|1):"
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:8a:fe:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
--
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:7f:54:57 brd ff:ff:ff:ff:ff:ff
    inet 192.168.160.102/24 brd 192.168.160.255 scope global noprefixroute eth1
[root@node01 ~]# iptables -L -t nat -v | grep 10.0.2.15
    0     0 DNAT       tcp  --  any    any     anywhere             anywhere             /* default/kubernetes:https */ tcp to:10.0.2.15:8443
    0     0 DNAT       udp  --  any    any     anywhere             anywhere             /* default/kubernetes:dns */ udp to:10.0.2.15:8053
    0     0 DNAT       tcp  --  any    any     anywhere             anywhere             /* default/kubernetes:dns-tcp */ tcp to:10.0.2.15:8053

Node01 can't connect properly to master node:

[root@node01 ~]# curl -vI https://172.30.0.1:443/ 
* About to connect() to 172.30.0.1 port 443 (#0)
*   Trying 172.30.0.1...
* Connection refused
* Failed connect to 172.30.0.1:443; Connection refused
* Closing connection 0
curl: (7) Failed connect to 172.30.0.1:443; Connection refused
[root@node01 ~]# ip route get 172.30.0.1 
172.30.0.1 dev tun0 src 10.129.0.1 

Changing MasterIP section to 192.168.160.101 solves this problem.

eliu commented 4 years ago

this might be something to do with this openshift.common.ip stuff i guess:

https://github.com/openshift/openshift-ansible/issues/11740#issuecomment-507559277

everywhere i looked at references etcd_ip=openshift.common.ip which should default to openshift_ip

openshift.common.ip defaults to first interface IP, openshift_ip doesn't override that