Closed titansmc closed 4 years ago
I am having the same issue.
Docker version 18.06.2-ce K8s version 1.16.3 Trident version 19.10 Storage driver - ontap-nas
tridentctl logs
"Node info not found." node=<node_name>
"GRPC error: rpc error: code = NotFound desc = node <node_name> was not found"
kubectl describe pod that's requesting the pvc
AttachVolume.Attach failed for volume "pvc-245d157b-f450-4fed-8e0b-29affcb6d53b" : rpc error: code = NotFound desc = node <node_name> was not found
I think this may have something to do with an old install that did not clean up properly? How can we completely remove Trident to try again? I have tried clearing out the trident entries in /var/lib/kubelet and in /var/lib/trident. but to no avail so far.
@titansmc and @kmwm3 can you share some more info on your k8s environment? Are you running vanilla k8s? What's the underlying OS on your underlying nodes?
I am using CentOS 7 deployed through kubespray. Cheers.
On Tue, Jan 14, 2020, 17:04 Balasubramanian Ramesh Babu < notifications@github.com> wrote:
@titansmc https://github.com/titansmc and @kmwm3 https://github.com/kmwm3 can you share some more info on your k8s environment? Are you running vanilla k8s? What's the underlying OS on your underlying nodes?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NetApp/trident/issues/328?email_source=notifications&email_token=AB6QAYGBBWBFRPJBJWLPPV3Q5XO7LA5CNFSM4KFETPN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5E5SA#issuecomment-574246600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6QAYHOU7ECZAWPWTQFEZDQ5XO7LANCNFSM4KFETPNQ .
@titansmc and @kmwm3 can you share some more info on your k8s environment? Are you running vanilla k8s? What's the underlying OS on your underlying nodes?
I am running vanilla k8s on RHEL 7.7.
i have same issue
the problem is trident did not get my cluster node asset
through log
it only join part of cluster node...
so pvc only mount on specific node, else all failed...
time="2020-02-04T09:18:10Z" level=debug msg="Authenticated by HTTPS REST frontend." peerCert=trident-node time="2020-02-04T09:18:10Z" level=debug msg="REST API call received." duration="1.523µs" method=PUT requestID=bosjdknr0f3d5tg4cl0g route=AddOrUpdateNode uri=/trident/v1/node/ddp-deveco-master02 time="2020-02-04T09:18:10Z" level=info msg="Added a new node." handler=AddOrUpdateNode node=ddp-deveco-master02 time="2020-02-04T09:18:10Z" level=debug msg="REST API call complete." duration=6.158862ms method=PUT requestID=bosjdknr0f3d5tg4cl0g route=AddOrUpdateNode uri=/trident/v1/node/ddp-deveco-master02 time="2020-02-04T09:18:17Z" level=debug msg="REST API call received." duration="2.491µs" method=GET requestID=bosjdmfr0f3d5tg4cl10 route=GetVersion uri=/trident/v1/version time="2020-02-04T09:18:17Z" level=debug msg="REST API call complete." duration="161.897µs" method=GET requestID=bosjdmfr0f3d5tg4cl10 route=GetVersion uri=/trident/v1/version time="2020-02-04T09:18:34Z" level=debug msg="Authenticated by HTTPS REST frontend." peerCert=trident-node time="2020-02-04T09:18:34Z" level=debug msg="REST API call received." duration="1.538µs" method=PUT requestID=bosjdqnr0f3d5tg4cl1g route=AddOrUpdateNode uri=/trident/v1/node/ddp-deveco-master03 time="2020-02-04T09:18:34Z" level=info msg="Added a new node." handler=AddOrUpdateNode node=ddp-deveco-master03 time="2020-02-04T09:18:34Z" level=debug msg="REST API call complete." duration=5.725727ms method=PUT requestID=bosjdqnr0f3d5tg4cl1g route=AddOrUpdateNode uri=/trident/v1/node/ddp-deveco-master03 time="2020-02-04T09:18:58Z" level=debug msg="Storage class updated in cache." name=nfs-client parameters="map[backendType:ontap-nas snapshots:true]" provisioner=csi.trident.netapp.io time="2020-02-04T09:19:08Z" level=debug msg="REST API call received." duration="3.05µs" method=POST requestID=bosje37r0f3d5tg4cl20 route=AddBackend uri=/trident/v1/backend
@teramucho, Kubernetes calls Trident's API to add the node once it is successfully registered. If a node in the cluster isn't added to Trident then that node may not have properly registered. Check the Trident node and driver registrar sidecar logs for errors. Also, check the kubelet logs. If this doesn't resolve your issue please contact NetApp Support.
All, a fix was just merged to address a situation where K8S DNS is not configured properly which can lead to the error as reported in this issue. Trident patches that contain the fix will be released in the near future. Thanks for your patience.
This issue was fixed with the Trident 20.01.1 release.
@gnarl Still got the issue on one of our clusters:
$ tridentctl -n trident get backend
+------------------+----------------+--------------------------------------+--------+---------+
| NAME | STORAGE DRIVER | UUID | STATE | VOLUMES |
+------------------+----------------+--------------------------------------+--------+---------+
| <redacted> | ontap-nas | <redacted> | online | 1 |
+------------------+----------------+--------------------------------------+--------+---------+
$
$
$ tridentctl -n trident version
+----------------+----------------+
| SERVER VERSION | CLIENT VERSION |
+----------------+----------------+
| 20.01.1 | 20.01.0 |
+----------------+----------------+
Trident cant find a few of the nodes in the cluster:
time="2020-04-25T14:28:49Z" level=error msg="Node info not found." node=node020
time="2020-04-25T14:28:49Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node020 was not found"
time="2020-04-25T14:28:49Z" level=error msg="Node info not found." node=node020
time="2020-04-25T14:28:49Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node020 was not found"
time="2020-04-25T14:28:50Z" level=error msg="Node info not found." node=node018
time="2020-04-25T14:28:50Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node018 was not found"
time="2020-04-25T14:28:50Z" level=error msg="Node info not found." node=node018
time="2020-04-25T14:28:50Z" level=error msg="GRPC error: rpc error: code = NotFound desc = node node018 was not found"
Any ideas what to try to get them up and running?
These machines were correctly connected before. Now we reinstalled the cluster (as training for new ops) and then the nodes dont get added anymore.
Is there a latest update on this issue. Do we have the fix
I have this problem to running OCP4 80% of the nodes are working the other 20% fails.
./tridentctl version +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 20.04.0 | 20.04.0 | +----------------+----------------+
Server Version: 4.4.9 Kubernetes Version: v1.17.1+912792b
The node is missing because the Trident object
Hi @presidenten, @ramancde, and @bigg01 we've investigated the issue and have not been able to reproduce it. If you see the issue again please contact NetApp support and provide Trident logs so that we can determine what is causing the issue.
There are two likely scenarios why Trident does not find a Kubernetes node. It can be because of a networking issue within Kubernetes or a DNS issue. The Trident node daemonset that runs on each Kubernetes node must be able to communicate with the Trident controller to register the node with Trident. If networking changes occurred after Trident was installed this problem may only be observed with new Kubernetes nodes that are added to the cluster.
There are two likely scenarios why Trident does not find a Kubernetes node. It can be because of a networking issue within Kubernetes or a DNS issue. The Trident node daemonset that runs on each Kubernetes node must be able to communicate with the Trident controller to register the node with Trident. If networking changes occurred after Trident was installed this problem may only be observed with new Kubernetes nodes that are added to the cluster.
This matches the kind of issue I am facing. Only newly added nodes won't register with the trident. I tried restarting the trident pods, tried removing/adding the impacted nodes but nothing helps. There have been no networking changes on the cluster and I don't see any networking/DNS related issues on the cluster.
Any pointers on how I can investigate this further?
The same error about not finding the node (not registered with Trident controller) seems to happen with K8s 1.17 and Trident 20.07 when the Autoscaler of Kubernetes adds a node to bring a pod in - the PV for the pod doesn't get added as a consequence, and the Pod is Pending. Do nodes in the "free pool" need to be prepared with Trident somehow, so the daemon is available when the Node starts up, and it can register ?)
@khatrig and @oleimann,
As indicated above we haven't been able to reproduce this issue yet. Please open a case with NetApp support so that we can collect additional information.
To open a case with NetApp, please go to https://mysupport.netapp.com/site/.
In my case, it turned out to be an issue with DNS on some nodes, trident-csi pod running on some nodes could not resolve trident-csi.trident service hence could not register the node.
@khatrig thanks for updating this issue.
For everyone that encountered this reported issue it was determined that either a DNS or a networking issue kept the Trident node DaemonSet from registering with the Trident controller. Commit 8e51987 improves the Info log message to help the Trident user resolve this registration issue.
Describe the bug Following the basic example in the documentation fails to attach the volume to the Pod.
Environment Provide accurate information about the environment to help us reproduce the issue.
Docker
k8s version
To Reproduce Follow the basic example
Expected behavior attach the created volume to the Pod
Additional context I also see in the logs errors related to iSCSI, which I believe we are not using.