FederatedAI / KubeFATE

Manage federated learning workload using cloud native technologies.
Apache License 2.0
423 stars 221 forks source link

kubefate service connection error #735

Open chenkeming111 opened 2 years ago

chenkeming111 commented 2 years ago

What deployment mode you are use? Kuberentes.

What KubeFATE and FATE version you are using? kubefate commandLine version=v1.4.4

What OS you are using for docker-compse or Kubernetes? Please also clear the version of OS. Ubuntu 20.04 LTS

我在部署kubefate的时候,现在遇到了这样一个问题 : 在我做了两块操作之后 我还是遇到了这样的问题: 1、我按照要求修改了hosts 文件 NAME CLASS HOSTS ADDRESS PORTS AGE ingress.networking.k8s.io/kubefate nginx example.com 10.233.252.130 80 3d16h sudo -- sh -c "echo \"10.233.252.130 example.com\" >> /etc/hosts"

2、修改了80端口所映射的端口 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller NodePort 10.233.252.130 80:31737/TCP,443:30436/TCP 2d19h ingress-nginx-controller-admission ClusterIP 10.233.5.16 443/TCP 2d19h

添加端口号 root@node237:/home/yjy000/ckm/kubefate# vim config.yaml serviceurl: example.com:31737

但是问题并没有解决 (base) root@node237:/home/yjy000/ckm/kubefate# kubefate version

(base) root@node237:/home/yjy000/ckm/kubefate# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default netchecker-agent-48b6w 1/1 Running 0 3d23h default netchecker-agent-6tds5 1/1 Running 1 (2d23h ago) 3d23h default netchecker-agent-hostnet-9qqx8 1/1 Running 0 3d23h default netchecker-agent-hostnet-sdxbn 1/1 Running 2 (2d23h ago) 3d23h default netchecker-server-589d76f698-24tl7 2/2 Running 1 (3d23h ago) 3d23h ingress-nginx ingress-nginx-admission-create-tq2ld 0/1 Completed 0 2d19h ingress-nginx ingress-nginx-admission-patch-f62gh 0/1 Completed 1 2d19h ingress-nginx ingress-nginx-controller-5fcb64df95-znh25 1/1 Running 0 17h kube-fate kubefate-56ccbdcb78-ntx2n 1/1 Running 2 (3d16h ago) 3d16h kube-fate mariadb-66bb7d68b7-xfltl 1/1 Running 0 3d16h kube-system calico-kube-controllers-75fcdd655b-9qwgv 1/1 Running 11 (2d23h ago) 3d23h kube-system calico-node-pjh6w 1/1 Running 0 3d23h kube-system calico-node-tspzw 1/1 Running 2 (2d23h ago) 3d23h kube-system coredns-76b4fb4578-p6lrn 1/1 Running 0 3d23h kube-system dns-autoscaler-7874cf6bcf-hxl89 1/1 Running 0 18h kube-system kube-apiserver-node237 1/1 Running 2 (2d23h ago) 3d23h kube-system kube-controller-manager-node237 1/1 Running 3 (2d23h ago) 3d23h kube-system kube-proxy-7lmjb 1/1 Running 0 3d23h kube-system kube-proxy-wz28m 1/1 Running 2 (2d23h ago) 3d23h kube-system kube-scheduler-node237 1/1 Running 3 (2d23h ago) 3d23h kube-system metrics-server-5c8c77d7b8-bsrm6 1/1 Running 0 18h kube-system nginx-proxy-node236 1/1 Running 0 3d23h kube-system nodelocaldns-k8ll6 1/1 Running 0 3d23h kube-system nodelocaldns-nkqg5 1/1 Running 2 (2d23h ago) 3d23h

I have encountered such a problem: kubefate service connection error,Post "http://example.com:31737/v1/user/login": dial tcp 172.17.0.1:31737: connect: connection refused I also set the two following operations:

  1. Changed the hosts files NAME CLASS HOSTS ADDRESS PORTS AGE ingress.networking.k8s.io/kubefate nginx example.com 10.233.252.130 80 3d16h sudo -- sh -c "echo \"10.233.252.130 example.com\" >> /etc/hosts"

  2. Changed the port than port 80

root@node237:/home/yjy000/ckm/kubefate# kubectl -n ingress-nginx get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller NodePort 10.233.252.130 80:31737/TCP,443:30436/TCP 2d19h ingress-nginx-controller-admission ClusterIP 10.233.5.16 443/TCP 2d19h

add the port root@node237:/home/yjy000/ckm/kubefate# vim config.yaml serviceurl: example.com:31737

but after that

So where did I make the mistake ?

asdfsx commented 2 years ago

I think this issue is duplicated to https://github.com/FederatedAI/KubeFATE/issues/368 You can find some solution in that issue.

chenkeming111 commented 2 years ago

I think this issue is duplicated to #368 You can find some solution in that issue.

I‘ll check, ty

chenkeming111 commented 2 years ago

I think this issue is duplicated to #368 You can find some solution in that issue.

sorry, after I read the comments, I found the issue isn't sloved . so could you please help me ? thank you very much

asdfsx commented 2 years ago

The easiest way should be use nodeport directly.

kubectl edit svc -n kube-fate kubefate

change the type: ClusterIP to type: NodePort, then check the port used by the service.

In config.yaml set serviceurl: ip:port

JingChen23 commented 2 years ago

sudo -- sh -c "echo "10.233.252.130 example.com" >> /etc/hosts"

You have put this into the hosts file.

kubefate service connection error, Post "http://example.com:31737/v1/user/login": dial tcp 172.16.3.236:31737: connect: connection refused

But here, seems like "example.com" was translated to "172.16.3.236"?

10.233.252.130 and 172.16.3.236, which one is the node ip?

Please also do kubectl get svc -A and paste the results here.

chenkeming111 commented 2 years ago

The easiest way should be use nodeport directly.

kubectl edit svc -n kube-fate kubefate

change the type: ClusterIP to type: NodePort, then check the port used by the service.

In config.yaml set serviceurl: ip:port

thank you very much . I'll try agian

chenkeming111 commented 2 years ago

sudo -- sh -c "echo "10.233.252.130 example.com" >> /etc/hosts"

You have put this into the hosts file.

kubefate service connection error, Post "http://example.com:31737/v1/user/login": dial tcp 172.16.3.236:31737: connect: connection refused

But here, seems like "example.com" was translated to "172.16.3.236"?

10.233.252.130 and 172.16.3.236, which one is the node ip?

Please also do kubectl get svc -A and paste the results here. image

JingChen23 commented 2 years ago

10.233.252.130 is the clusterIp of your ingress controller.

Try

sudo -- sh -c "echo " <your_real_node_ip> example.com" >> /etc/hosts"

chenkeming111 commented 2 years ago

10.233.252.130 is the clusterIp of your ingress controller.

Try

sudo -- sh -c "echo " <your_real_node_ip> example.com" >> /etc/hosts"

ok, thank you very much. I'll try it

wuliu1516 commented 1 week ago

change kubefate file