coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

Issue with Trident CSI starting with FCOS 36.20220618.3.1 #1280

Open pmokrz opened 2 years ago

pmokrz commented 2 years ago

Describe the bug After doing upgrade Fedora CoreOS to ver 36.20220716.3.1 the Node-CSI driver still connecting.

time="2022-08-02T14:24:57Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CS
I Controller; Put "[https://10.233.51.147:34571/trident/v1/node/l8101\](https://10.233.51.147:34571/trident/v1/node/l8101%5C)": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=10.496269484s requestID=e3e2d6c4-c6d7-497a-93c4-95

I have this problem only with new OS: Fedora CoreOS 36.20220618.3.1 Fedora CoreOS 36.20220703.3.1 Fedora CoreOS 36.20220716.3.1

Environment

$ tridentctl version
+----------------+----------------+
| SERVER VERSION | CLIENT VERSION |
+----------------+----------------+
| 22.07.0 | 22.04.0 |
+----------------+----------------+
$ k get node -o wide
NAME STATUS ROLES AGE VERSION OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
l8101 Ready control-plane,master 78d v1.23.7 Fedora CoreOS 36.20220716.3.1 5.18.11-200.fc36.x86_64 containerd://1.6.4
l8102 Ready control-plane,master 78d v1.23.7 Fedora CoreOS 36.20220605.3.0 5.17.12-300.fc36.x86_64 containerd://1.6.4
l8103 Ready control-plane,master 78d v1.23.7 Fedora CoreOS 36.20220605.3.0 5.17.12-300.fc36.x86_64 containerd://1.6.4
l8104 Ready 78d v1.23.7 Fedora CoreOS 36.20220605.3.0 5.17.12-300.fc36.x86_64 containerd://1.6.4

$ k get pod -o wide
NAME READY STATUS AGE IP NODE
trident-csi-5pplc 2/2 Running 34m 10.226.11.133 l8103
trident-csi-6f2wr 1/2 Running 8m58s 10.226.11.131 l8101
trident-csi-cvz7d 2/2 Running 34m 10.226.11.132 l8102
trident-csi-vsgn8 2/2 Running 34m 10.226.11.134 l8104
trident-operator-7c98557778-ftqq5 1/1 Running 34m 10.233.121.86 l8105
trident-csi-797999d774-rq5ch 6/6 Running 34m 10.233.80.77 l8108

Logs
k logs -f trident-csi-6f2wr trident-main
time="2022-08-02T14:24:27Z" level=info msg="Running Trident storage orchestrator." binary=/trident_orchestrator build_time="Fri Jul 29 11:24:51 EDT 2022" version=22.07.0
time="2022-08-02T14:24:27Z" level=info msg="Initializing plain CSI helper frontend."
time="2022-08-02T14:24:27Z" level=info msg="Added frontend." name=plain_csi_helper
time="2022-08-02T14:24:27Z" level=info msg="Initializing CSI frontend." name=l8101 version=22.07.0
time="2022-08-02T14:24:27Z" level=info msg="Enabling node service capability." capability=STAGE_UNSTAGE_VOLUME
time="2022-08-02T14:24:27Z" level=info msg="Enabling node service capability." capability=EXPAND_VOLUME
time="2022-08-02T14:24:27Z" level=info msg="Enabling node service capability." capability=GET_VOLUME_STATS
time="2022-08-02T14:24:27Z" level=info msg="Enabling volume access mode." mode=SINGLE_NODE_WRITER
time="2022-08-02T14:24:27Z" level=info msg="Enabling volume access mode." mode=SINGLE_NODE_READER_ONLY
time="2022-08-02T14:24:27Z" level=info msg="Enabling volume access mode." mode=MULTI_NODE_READER_ONLY
time="2022-08-02T14:24:27Z" level=info msg="Enabling volume access mode." mode=MULTI_NODE_SINGLE_WRITER
time="2022-08-02T14:24:27Z" level=info msg="Enabling volume access mode." mode=MULTI_NODE_MULTI_WRITER
time="2022-08-02T14:24:27Z" level=info msg="Added frontend." name=csi
time="2022-08-02T14:24:27Z" level=info msg="Initializing HTTPS REST frontend." address=":17546"
time="2022-08-02T14:24:27Z" level=info msg="Added frontend." name="HTTPS REST"
time="2022-08-02T14:24:27Z" level=info msg="Added 0 existing volume(s)" requestID=34eb6d49-a684-4f89-b6fa-6e1767e351a0 requestSource=Internal
time="2022-08-02T14:24:27Z" level=info msg="Trident bootstrapped successfully."
time="2022-08-02T14:24:27Z" level=info msg="Activating plain CSI helper frontend."
time="2022-08-02T14:24:27Z" level=info msg="Activating HTTPS REST frontend." address=":17546"
time="2022-08-02T14:24:27Z" level=info msg="Activating CSI frontend." requestID=e3e2d6c4-c6d7-497a-93c4-959fb1bf0bfe requestSource=Internal
time="2022-08-02T14:24:27Z" level=info msg="Starting periodic node access reconciliation service." requestID=dc00598e-6109-4b50-b622-517187516267 requestSource=Periodic
time="2022-08-02T14:24:27Z" level=info msg="Discovered iSCSI initiator name." IQN="iqn.1994-05.com.redhat:de886ad0c8f6" requestID=e3e2d6c4-c6d7-497a-93c4-959fb1bf0bfe requestSource=Internal
time="2022-08-02T14:24:27Z" level=info msg="Discovered IP addresses." IP Addresses="[10.226.11.131 10.233.84.0 172.17.0.1]" requestID=e3e2d6c4-c6d7-497a-93c4-959fb1bf0bfe requestSource=Internal
time="2022-08-02T14:24:57Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put "[https://10.233.51.147:34571/trident/v1/node/l8101\](https://10.233.51.147:34571/trident/v1/node/l8101%5C)": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=10.496269484s requestID=e3e2d6c4-c6d7-497a-93c4-959fb1bf0bfe requestSource=Internal
time="2022-08-02T14:25:38Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put "[https://10.233.51.147:34571/trident/v1/node/l8101\](https://10.233.51.147:34571/trident/v1/node/l8101%5C)": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=19.057534053s requestID=e3e2d6c4-c6d7-497a-93c4-959fb1bf0bfe requestSource=Internal
time="2022-08-02T14:26:27Z" level=warning msg="Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller; Put "[https://10.233.51.147:34571/trident/v1/node/l8101\](https://10.233.51.147:34571/trident/v1/node/l8101%5C)": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" increment=40.699091843s requestID=e3e2d6c4-c6d7-497a-93c4-959fb1bf0bfe requestSource=Internal

$ k logs -f trident-csi-6f2wr driver-registrar
I0802 14:24:27.601972 3484 main.go:166] Version: v2.5.1
I0802 14:24:27.602066 3484 main.go:167] Running node-driver-registrar in mode=registration
I0802 14:24:27.603799 3484 main.go:191] Attempting to open a gRPC connection with: "/plugin/csi.sock"
W0802 14:24:37.604947 3484 connection.go:173] Still connecting to unix:///plugin/csi.sock
W0802 14:24:47.603983 3484 connection.go:173] Still connecting to unix:///plugin/csi.sock
W0802 14:24:57.604876 3484 connection.go:173] Still connecting to unix:///plugin/csi.sock
W0802 14:25:07.604311 3484 connection.go:173] Still connecting to unix:///plugin/csi.sock
W0802 14:25:17.604232 3484 connection.go:173] Still connecting to unix:///plugin/csi.sock
dustymabe commented 2 years ago

I don't know anything about trident. Here's the package list that changed during the update from 36.20220605.3.0 to 36.20220618.3.1:

Upgraded:                                                                                                                                                                                                
  NetworkManager 1:1.38.0-1.fc36 -> 1:1.38.0-2.fc36                                                                                                                                                              
  NetworkManager-cloud-setup 1:1.38.0-1.fc36 -> 1:1.38.0-2.fc36                                                                                                                    
  NetworkManager-libnm 1:1.38.0-1.fc36 -> 1:1.38.0-2.fc36                                                                                                                          
  NetworkManager-team 1:1.38.0-1.fc36 -> 1:1.38.0-2.fc36                                                                                                                           
  NetworkManager-tui 1:1.38.0-1.fc36 -> 1:1.38.0-2.fc36                                                                                                                            
  afterburn 5.3.0-1.fc36 -> 5.3.0-2.fc36                                                                                                                                           
  afterburn-dracut 5.3.0-1.fc36 -> 5.3.0-2.fc36                                                                                                                                    
  containerd 1.6.4-1.fc36 -> 1.6.6-1.fc36                                                                                                                                          
  coreos-installer 0.14.0-1.fc36 -> 0.15.0-1.fc36                                                                                                                                  
  coreos-installer-bootinfra 0.14.0-1.fc36 -> 0.15.0-1.fc36                                                                                                                        
  crypto-policies 20220203-2.git112f859.fc36 -> 20220428-1.gitdfb10ea.fc36                                                                                                         
  fuse-sshfs 3.7.2-3.fc36 -> 3.7.3-1.fc36           
  grub2-common 1:2.06-40.fc36 -> 1:2.06-42.fc36                                                         
  grub2-efi-x64 1:2.06-40.fc36 -> 1:2.06-42.fc36                                                        
  grub2-pc 1:2.06-40.fc36 -> 1:2.06-42.fc36         
  grub2-pc-modules 1:2.06-40.fc36 -> 1:2.06-42.fc36                                                     
  grub2-tools 1:2.06-40.fc36 -> 1:2.06-42.fc36                                                          
  grub2-tools-minimal 1:2.06-40.fc36 -> 1:2.06-42.fc36                                                  
  kernel 5.17.12-300.fc36 -> 5.18.5-200.fc36                                                            
  kernel-core 5.17.12-300.fc36 -> 5.18.5-200.fc36                                                       
  kernel-modules 5.17.12-300.fc36 -> 5.18.5-200.fc36                                                    
  libipa_hbac 2.7.0-1.fc36 -> 2.7.1-2.fc36          
  libnfsidmap 1:2.6.1-2.rc5.fc36 -> 1:2.6.1-2.rc6.fc36                                                  
  libsss_certmap 2.7.0-1.fc36 -> 2.7.1-2.fc36                                                           
  libsss_idmap 2.7.0-1.fc36 -> 2.7.1-2.fc36         
  libsss_nss_idmap 2.7.0-1.fc36 -> 2.7.1-2.fc36                                                         
  libsss_sudo 2.7.0-1.fc36 -> 2.7.1-2.fc36          
  libtool-ltdl 2.4.6-50.fc36 -> 2.4.7-1.fc36                                                            
  linux-firmware 20220509-132.fc36 -> 20220610-135.fc36                                                 
  linux-firmware-whence 20220509-132.fc36 -> 20220610-135.fc36                                          
  moby-engine 20.10.16-1.fc36 -> 20.10.17-2.fc36                                                        
  mozjs91 91.9.0-1.fc36 -> 91.10.0-1.fc36           
  nfs-utils-coreos 1:2.6.1-2.rc5.fc36 -> 1:2.6.1-2.rc6.fc36                                             
  ostree 2022.3-3.fc36 -> 2022.4-2.fc36             
  ostree-libs 2022.3-3.fc36 -> 2022.4-2.fc36                                                            
  podman 3:4.1.0-1.fc36 -> 3:4.1.0-8.fc36           
  podman-plugins 3:4.1.0-1.fc36 -> 3:4.1.0-8.fc36                                                       
  rpm-ostree 2022.9-1.fc36 -> 2022.10-2.fc36                                                            
  rpm-ostree-libs 2022.9-1.fc36 -> 2022.10-2.fc36                                                       
  shim-x64 15.4-5 -> 15.6-1                         
  skopeo 1:1.7.0-1.fc36 -> 1:1.8.0-8.fc36           
  sssd-ad 2.7.0-1.fc36 -> 2.7.1-2.fc36              
  sssd-client 2.7.0-1.fc36 -> 2.7.1-2.fc36          
  sssd-common 2.7.0-1.fc36 -> 2.7.1-2.fc36          
  sssd-common-pac 2.7.0-1.fc36 -> 2.7.1-2.fc36                                                          
  sssd-ipa 2.7.0-1.fc36 -> 2.7.1-2.fc36             
  sssd-krb5 2.7.0-1.fc36 -> 2.7.1-2.fc36            
  sssd-krb5-common 2.7.0-1.fc36 -> 2.7.1-2.fc36                                                         
  sssd-ldap 2.7.0-1.fc36 -> 2.7.1-2.fc36            
  sssd-nfs-idmap 2.7.0-1.fc36 -> 2.7.1-2.fc36                                                           
  systemd 250.6-1.fc36 -> 250.7-1.fc36              
  systemd-container 250.6-1.fc36 -> 250.7-1.fc36                                                        
  systemd-libs 250.6-1.fc36 -> 250.7-1.fc36         
  systemd-pam 250.6-1.fc36 -> 250.7-1.fc36          
  systemd-resolved 250.6-1.fc36 -> 250.7-1.fc36                                                         
  systemd-udev 250.6-1.fc36 -> 250.7-1.fc36         
  vim-data 2:8.2.5052-1.fc36 -> 2:8.2.5085-1.fc36                                                       
  vim-minimal 2:8.2.5052-1.fc36 -> 2:8.2.5085-1.fc36                                                    
Removed:                                            
  sssd-idp-2.7.0-1.fc36.x86_64

The problem may be somewhere in that package set change.

Do you have any more information about what exactly is failing?

pmokrz commented 2 years ago

From what I found it somewhere after the update, some strange way is blocking communication between the pods in the k8s cluster. Maybe someone had a similar case?

pmokrz commented 2 years ago

My solution sudo ethtool --offload vxlan.calico rx off tx off was done on every node