KeKouShi commented 2 months ago

目前看到的现象是控制台显示集群通信异常
rbd-api 状态显示没有 ready，通过查看日志显示 event log server rbd-eventlog:6366 connect error: context canceled. auto retry after 10 seconds " time="2024-08-20T00:48:56+08:00" level=error msg="create event client error.context canceled
物理机执行 df -h卡顿
在集群中手动删除rbd-api，删除后pod状态会显示ContainerCreating， Warning FailedMount 2m10s kubelet Unable to attach or mount volumes: unmounted volumes=[grdata], unattached volumes=[grdata kube-api-access-686sb]: timed out waiting for the condition

Issues-translate-bot commented 2 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Title: Cluster communication exception

zzzhangqi commented 2 months ago

@KeKouShi

1. rbd-api 日志报错

从提供的日志中看是 rbd-api 要连接 rbd-eventlog 服务，通过 service name rbd-eventlog 无法进行连接，很可能是容器之间的 service 不通导致的。

df -h 卡顿

df -h 卡住一般情况是有挂载的文件系统无法访问了，导致卡住。可以使用 mount 命令查看有哪些挂载。

一般情况下是与 nfs-provisioner 服务的 service ip 无法通信导致的命令卡住，尝试使用 umount -l xxx 卸载挂载后就可以继续使用 df -h 命令。

rbd-api ContainerCreating

从提供的描述和截图来看，是挂载不上 nfs-provisioner 服务提供的存储了，挂载存储是使用 nfs-provisioner 的 service ip 进行 mount

综上所述，都是 k8s 的 service ip 不通导致的问题，可能导致 service ip 不通的问题有很多，例如 iptables、kube-proxy、NetworkManager 这些服务都能可能导致 service ip 不通。

具体为何 service ip 不通还需您自行排查

lushaogen commented 2 months ago

我也同样遇到这个问题,临时解决方案:

重启NFS

kubectl delete pod -l name=nfs-provisioner -n rbd-system

重启API

kubectl delete pod -l name=rbd-api -n rbd-system

目前也没找到为什么k8s 的 service ip之间不通讯,有没有对应的排查方案???

Issues-translate-bot commented 2 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

I also encountered this problem, temporary solution:

RestartNFS

kubectl delete pod -l name=nfs-provisioner -n rbd-system

RestartAPI

kubectl delete pod -l name=rbd-api -n rbd-system

At present, I have not found why the service IPs of k8s do not communicate with each other. Is there any corresponding troubleshooting plan???

goodrain / rainbond

集群通信异常 #1964

1. rbd-api 日志报错

df -h 卡顿

rbd-api ContainerCreating

重启NFS

重启API

RestartNFS

RestartAPI