bryopsida / wireguard-chart

A helm chart for wireguard
40 stars 25 forks source link

modprobe: can't change directory to '/lib/modules': No such file or directory #39

Closed lorenzomonta closed 8 months ago

lorenzomonta commented 9 months ago

Hi @bryopsida ,

Thanks for sharing this chart with the community!

I found this issue:

Installation on OKE (oracle kubernetes engine) node OS: Oracle Linux Server release 8.8 (aarch64) Kubernetes version: 1.28.2 version wireguard-chart: 0.18.0

Pod logs: [#] ip link add wg0 type wireguard [#] wg setconf wg0 /dev/fd/63 [#] ip -4 address add X.X.X.X/24 dev wg0 [#] ip link set mtu 8920 up dev wg0 [#] wg set wg0 private-key /etc/wireguard/privatekey && iptables -t nat -A POSTROUTING -s X.X.X.X/24 -o eth0 -j MASQUERADE modprobe: can't change directory to '/lib/modules': No such file or directory modprobe: can't change directory to '/lib/modules': No such file or directory iptables v1.8.9 (legacy): can't initialize iptables table nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded. [#] ip link delete dev wg0 Public key 'xxxxxx' wg-quick:wg0' is not a WireGuard interface Stream closed EOF for wireguard/wg-vpn-1-wireguard-1234 (wireguard)

Can you kindly verify what the problem is? Thank you

lorenzomonta commented 9 months ago

Last question/advice: OCI doesn't support UPD type Load Balancer, I was thinking to use a UDP type Network Load Balancer and forward the traffic to NodePort service (instead of LoadBalancer), can this work? thanks

bryopsida commented 9 months ago

iptables v1.8.9 (legacy): can't initialize iptables table nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.

I'm seeing references to this error occurring when a kernel is updated/patched but the system hasn't been rebooted yet to use the new kernel? Any chance that's happening here?

If not can you share what image/tag version you are using and any modifications that may have been made to the default securityContext properties?

Last question/advice: OCI doesn't support UPD type Load Balancer, I was thinking to use a UDP type Network Load Balancer and forward the traffic to NodePort service (instead of LoadBalancer), can this work? thanks

Yes that should work.

lorenzomonta commented 9 months ago

I thank you for your reply!

The helm command for installation is as follows: helm upgrade --install wg-vpn-1 wireguard/wireguard --version 0.18.0 --namespace wireguard --set metrics.enabled=true

The manifest is:

`apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: xxxxx
    kubectl.kubernetes.io/restartedAt: "2024-01-26T22:50:15+01:00"
  creationTimestamp: "2024-01-26T21:51:56Z"
  generateName: wg-vpn-1-wireguard-xxxx
  labels:
    app: wg-vpn-1-wireguard
    pod-template-hash: xxxxxxxxx
    role: vpn
  name: wg-vpn-1-wireguard-xxxxxxxxx-xxx
  namespace: wireguard
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: wg-vpn-1-wireguard-xxxxxxxxx
    uid: xxxx
  resourceVersion: "xxxx"
  uid: xxxx
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: wg-vpn-1-wireguard
            role: vpn
        topologyKey: kubernetes.io/hostname
  automountServiceAccountToken: false
  containers:
  - env:
    - name: LOG_LEVEL
      value: info
    image: ghcr.io/bryopsida/wireguard:main
    imagePullPolicy: Always
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - ip link show dev wg0 | grep -s up
      failureThreshold: 3
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: wireguard
    ports:
    - containerPort: 51820
      name: wireguard
      protocol: UDP
    readinessProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - ip link show dev wg0 | grep -s up
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 100m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 256Mi
    securityContext:
      allowPrivilegeEscalation: true
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        - SETUID
        - SETGID
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000
    startupProbe:
      exec:
        command:
        - /bin/sh
        - -c
        - ip link show dev wg0 | grep -s up
      failureThreshold: 15
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /run
      name: run
    - mountPath: /etc/wireguard/wg0.conf
      name: config
      subPath: wg0.conf
    - mountPath: /etc/wireguard/privatekey
      name: privatekey
      subPath: privatekey
  - args:
    - -a
    - "true"
    env:
    - name: EXPORT_LATEST_HANDSHAKE_DELAY
      value: "true"
    - name: PROMETHEUS_WIREGUARD_EXPORTER_ADDRESS
      value: 0.0.0.0
    - name: PROMETHEUS_WIREGUARD_EXPORTER_CONFIG_FILE_NAMES
      value: /etc/wireguard/wg0.conf
    - name: PROMETHEUS_WIREGUARD_EXPORTER_EXPORT_REMOTE_IP_AND_PORT_ENABLED
      value: "true"
    - name: PROMETHEUS_WIREGUARD_EXPORTER_INTERFACES
      value: all
    - name: PROMETHEUS_WIREGUARD_EXPORTER_PREPEND_SUDO_ENABLED
      value: "false"
    - name: PROMETHEUS_WIREGUARD_EXPORTER_SEPARATE_ALLOWED_IPS_ENABLED
      value: "true"
    - name: PROMETHEUS_WIREGUARD_EXPORTER_VERBOSE_ENABLED
      value: "false"
    - name: PROMETHEUS_WIREGUARD_EXPORTER_PORT
      value: "9586"
    image: docker.io/mindflavor/prometheus-wireguard-exporter:3.6.6
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /metrics
        port: 9586
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: wireguard-exporter
    ports:
    - containerPort: 9586
      name: exporter
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /metrics
        port: 9586
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 100m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 256Mi
    securityContext:
      allowPrivilegeEscalation: true
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        - SETUID
        - SETGID
        drop:
        - ALL
      privileged: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000
    startupProbe:
      failureThreshold: 15
      httpGet:
        path: /metrics
        port: 9586
        scheme: HTTP
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /run
      name: run
    - mountPath: /etc/wireguard/wg0.conf
      name: config
      subPath: wg0.conf
    - mountPath: /etc/wireguard/privatekey
      name: privatekey
      subPath: privatekey
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - sh
    - -c
    - sysctl -w net.ipv4.ip_forward=1 && sysctl -w net.ipv4.conf.all.forwarding=1
    image: busybox:stable
    imagePullPolicy: IfNotPresent
    name: sysctls
    resources:
      limits:
        cpu: 100m
        memory: 64Mi
      requests:
        cpu: 100m
        memory: 64Mi
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_RAW
        - SETUID
        - SETGID
        drop:
        - ALL
      privileged: true
      runAsNonRoot: false
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  nodeName: x.x.x.x
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1000
    fsGroupChangePolicy: OnRootMismatch
    runAsNonRoot: true
  serviceAccount: wg-vpn-1-sa
  serviceAccountName: wg-vpn-1-sa
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  topologySpreadConstraints:
  - labelSelector:
      matchLabels:
        app: wg-vpn-1-wireguard
    matchLabelKeys:
    - pod-template-hash
    maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
  volumes:
  - emptyDir: {}
    name: run
  - name: config
    secret:
      defaultMode: 420
      secretName: wg-vpn-1-wg-config
  - name: privatekey
    secret:
      defaultMode: 420
      secretName: wg-vpn-1-wg-generated
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-01-27T14:54:39Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-01-27T14:54:35Z"
    message: 'containers with unready status: [wireguard]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-01-27T14:54:35Z"
    message: 'containers with unready status: [wireguard]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-01-27T14:54:35Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://xxxx
    image: ghcr.io/bryopsida/wireguard:main
    imageID: xxxx
    lastState:
      terminated:
        containerID: cri-o://xxxxx
        exitCode: 0
        finishedAt: "2024-01-28T10:10:18Z"
        reason: Completed
        startedAt: "2024-01-28T10:09:47Z"
    name: wireguard
    ready: false
    restartCount: 383
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=wireguard pod=wg-vpn-1-wireguard-xxxxxxxxx-xxx_wireguard(xxxx)
        reason: CrashLoopBackOff
  - containerID: cri-o://xxxxx
    image: docker.io/mindflavor/prometheus-wireguard-exporter:3.6.6
    imageID: xxxxxx
    lastState: {}
    name: wireguard-exporter
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2024-01-28T09:36:24Z"
  hostIP: x.x.x.x
  initContainerStatuses:
  - containerID: cri-o://xxxxx
    image: docker.io/library/busybox:stable
    imageID: xxxxxx
    lastState: {}
    name: sysctls
    ready: true
    restartCount: 1
    started: false
    state:
      terminated:
        containerID: cri-o://xxxxxx
        exitCode: 0
        finishedAt: "2024-01-28T09:36:23Z"
        reason: Completed
        startedAt: "2024-01-28T09:36:22Z"
  phase: Running
  podIP: x.x.x.x
  podIPs:
  - ip: x.x.x.x
  qosClass: Guaranteed
  startTime: "2024-01-27T14:54:35Z"

The pod describe is:

Name:             wg-vpn-1-wireguard-xxxxxxxxx
Namespace:        wireguard
Priority:         0
Service Account:  wg-vpn-1-sa
Node:             x.x.x.x/x.x.x.x
Start Time:       Sat, 27 Jan 2024 15:54:35 +0100
Labels:           app=wg-vpn-1-wireguard
                  pod-template-hash=xxxxxxx
                  role=vpn
Annotations:      checksum/config: xxxxxxx
                  kubectl.kubernetes.io/restartedAt: 2024-01-26T22:50:15+01:00
Status:           Running
IP:               x.x.x.x
IPs:
  IP:           x.x.x.x
Controlled By:  ReplicaSet/wg-vpn-1-wireguard-xxxxxxxxx
Init Containers:
  sysctls:
    Container ID:  cri-o://xxxxxxx
    Image:         busybox:stable
    Image ID:      xxxxxxx
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      sysctl -w net.ipv4.ip_forward=1 && sysctl -w net.ipv4.conf.all.forwarding=1
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 28 Jan 2024 10:36:22 +0100
      Finished:     Sun, 28 Jan 2024 10:36:23 +0100
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     100m
      memory:  64Mi
    Requests:
      cpu:        100m
      memory:     64Mi
    Environment:  <none>
    Mounts:       <none>
Containers:
  wireguard:
    Container ID:   cri-o://xxxxxxxx
    Image:          ghcr.io/bryopsida/wireguard:main
    Image ID:       xxxxxxx
    Port:           51820/UDP
    Host Port:      0/UDP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 28 Jan 2024 10:57:14 +0100
      Finished:     Sun, 28 Jan 2024 10:57:44 +0100
    Ready:          False
    Restart Count:  379
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   exec [/bin/sh -c ip link show dev wg0 | grep -s up] delay=20s timeout=1s period=10s #success=1 #failure=3
    Readiness:  exec [/bin/sh -c ip link show dev wg0 | grep -s up] delay=5s timeout=1s period=10s #success=1 #failure=3
    Startup:    exec [/bin/sh -c ip link show dev wg0 | grep -s up] delay=0s timeout=1s period=2s #success=1 #failure=15
    Environment:
      LOG_LEVEL:  info
    Mounts:
      /etc/wireguard/privatekey from privatekey (rw,path="privatekey")
      /etc/wireguard/wg0.conf from config (rw,path="wg0.conf")
      /run from run (rw)
  wireguard-exporter:
    Container ID:  cri-o://xxxxxx
    Image:         docker.io/mindflavor/prometheus-wireguard-exporter:3.6.6
    Image ID:      xxxxxx
    Port:          9586/TCP
    Host Port:     0/TCP
    Args:
      -a
      true
    State:          Running
      Started:      Sun, 28 Jan 2024 10:36:24 +0100
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://:9586/metrics delay=20s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:9586/metrics delay=5s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:9586/metrics delay=0s timeout=1s period=2s #success=1 #failure=15
    Environment:
      EXPORT_LATEST_HANDSHAKE_DELAY:                                    true
      PROMETHEUS_WIREGUARD_EXPORTER_ADDRESS:                            0.0.0.0
      PROMETHEUS_WIREGUARD_EXPORTER_CONFIG_FILE_NAMES:                  /etc/wireguard/wg0.conf
      PROMETHEUS_WIREGUARD_EXPORTER_EXPORT_REMOTE_IP_AND_PORT_ENABLED:  true
      PROMETHEUS_WIREGUARD_EXPORTER_INTERFACES:                         all
      PROMETHEUS_WIREGUARD_EXPORTER_PREPEND_SUDO_ENABLED:               false
      PROMETHEUS_WIREGUARD_EXPORTER_SEPARATE_ALLOWED_IPS_ENABLED:       true
      PROMETHEUS_WIREGUARD_EXPORTER_VERBOSE_ENABLED:                    false
      PROMETHEUS_WIREGUARD_EXPORTER_PORT:                               9586
    Mounts:
      /etc/wireguard/privatekey from privatekey (rw,path="privatekey")
      /etc/wireguard/wg0.conf from config (rw,path="wg0.conf")
      /run from run (rw)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  run:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  wg-vpn-1-wg-config
    Optional:    false
  privatekey:
    Type:                     Secret (a volume populated by a Secret)
    SecretName:               wg-vpn-1-wg-generated
    Optional:                 false
QoS Class:                    Guaranteed
Node-Selectors:               <none>
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app=wg-vpn-1-wireguard
Events:
  Type     Reason                  Age                   From     Message
  ----     ------                  ----                  ----     -------
  Warning  Unhealthy               46m (x5267 over 19h)  kubelet  Startup probe failed: Device "wg0" does not exist.
  Warning  BackOff                 31m (x4539 over 19h)  kubelet  Back-off restarting failed container wireguard in pod wg-vpn-1-wireguard-xxxxxxxxx-xx_wireguard(xxxxxxxx)
  Warning  FailedCreatePodSandBox  25m                   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_wg-vpn-1-wireguard-xxxxxxxxx-xxx_wireguard_xxxx(xxxxx): error adding pod wireguard_wg-vpn-1-wireguard-xxxxxxxxx to CNI network "oci": plugin type="oci-ipvlan" failed (add): unable to allocate IP address
  Warning  FailedCreatePodSandBox  25m                   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_wg-vpn-1-wireguard-xxxxxxxxx-xxxx_wireguard_xxxx(xxxx): error adding pod wireguard_wg-vpn-1-wireguard-xxxxxxxxx-xx to CNI network "oci": plugin type="oci-ipvlan" failed (add): unable to allocate IP address
  Normal   Pulled                  25m                   kubelet  Container image "busybox:stable" already present on machine
  Normal   Created                 25m                   kubelet  Created container sysctls
  Normal   Started                 25m                   kubelet  Started container sysctls
  Normal   Pulling                 25m                   kubelet  Pulling image "ghcr.io/bryopsida/wireguard:main"
  Normal   Pulled                  25m                   kubelet  Successfully pulled image "ghcr.io/bryopsida/wireguard:main" in 565ms (565ms including waiting)
  Normal   Created                 25m                   kubelet  Created container wireguard
  Normal   Started                 25m                   kubelet  Started container wireguard
  Normal   Pulled                  25m                   kubelet  Container image "docker.io/mindflavor/prometheus-wireguard-exporter:3.6.6" already present on machine
  Normal   Created                 25m                   kubelet  Created container wireguard-exporter
  Normal   Started                 25m                   kubelet  Started container wireguard-exporter
  Warning  Unhealthy               10m (x133 over 24m)   kubelet  Startup probe failed: Device "wg0" does not exist.
  Warning  BackOff                 25s (x97 over 22m)    kubelet  Back-off restarting failed container wireguard in pod wg-vpn-1-wireguard-xxxxxxxxx-xxxx_wireguard(xxx)

I rebooted the node, also I checked the installed kernel versions and I am using the latest version among those installed

bryopsida commented 9 months ago

My suspicion is the image being used for the kubernetes nodes may not have the wireguard module loaded/available and that's why the wg0 interface isn't being created.

I do not have access to a oracle kubernetes environment but I did spin up a standard headless server aarch64 oracle 8.8 VM and loaded k3s 1.28 onto that and was able to successfully deploy and get a healthy pod.

FWIW some version/details of that.

$ uname -a
Linux localhost.localdomain 5.15.0-101.103.2.1.el8uek.aarch64 #2 SMP Mon May 1 19:47:28 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux

$ cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.8"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:8:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.8
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.8

$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

Image info from running pod:

---
    image: ghcr.io/bryopsida/wireguard:main
    imagePullPolicy: Always
---
  containerStatuses:
  - containerID: containerd://d1647d3b59f41ccacd41c417d6c01f7750db7a5027f9fda31db7ff1d5d6dadc5
    image: ghcr.io/bryopsida/wireguard:main
    imageID: ghcr.io/bryopsida/wireguard@sha256:07e00d0cad1a6b32cd4aa49f023a60922ae138af873b40b4af3d8b7f0c81766d
    lastState:
      terminated:
        containerID: containerd://5a2ceb57f30f7c803cdba3b7af225af5d83a6bc2d52864de421d9a01ae8bfd47
        exitCode: 255
        finishedAt: "2024-01-28T16:40:33Z"
        reason: Unknown
        startedAt: "2024-01-28T15:47:52Z"
    name: wireguard
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2024-01-28T16:40:41Z"
  hostIP: 10.211.55.9
  initContainerStatuses:
  - containerID: containerd://c9b7a3cfe8c791efec38691ab4184bf141f96f6d3807f1c95f76e9ff4a6b9000
    image: docker.io/library/busybox:stable
    imageID: docker.io/library/busybox@sha256:6d9ac9237a84afe1516540f40a0fafdc86859b2141954b4d643af7066d598b74
    lastState: {}
    name: sysctls
    ready: true
    restartCount: 1
    started: false
    state:
      terminated:
        containerID: containerd://c9b7a3cfe8c791efec38691ab4184bf141f96f6d3807f1c95f76e9ff4a6b9000
        exitCode: 0
        finishedAt: "2024-01-28T16:40:37Z"
        reason: Completed
        startedAt: "2024-01-28T16:40:37Z"

node info

$ kubectl get node localhost.localdomain -o yaml
...
  nodeInfo:
    architecture: arm64
    bootID: 65bbc199-8ef8-4222-a91e-82148d283057
    containerRuntimeVersion: containerd://1.7.11-k3s2
    kernelVersion: 5.15.0-101.103.2.1.el8uek.aarch64
    kubeProxyVersion: v1.28.5+k3s1
    kubeletVersion: v1.28.5+k3s1
    machineID: 27e1a42afb8a40dd963beba83ed2ab40
    operatingSystem: linux
    osImage: Oracle Linux Server 8.8
    systemUUID: db7c154b-40bb-4bc8-a786-aeb275ad15e7

proc module from container

$ cat /proc/modules | grep wireguard
wireguard 118784 0 - Live 0x0000000000000000
...

proc module from node

$ cat /proc/modules | grep wireguard
wireguard 118784 0 - Live 0x0000000000000000
...

Can you confirm the wireguard kernel module is available on your nodes?

lorenzomonta commented 9 months ago

I really appreciate the tests you did, these are the results (similar/equal to yours). At this point I have opened a ticket to Oracle support.

$ uname -a
Linux oke-xxxx-yyyy 5.15.0-105.125.6.2.1.el8uek.aarch64 #2 SMP Thu Sep 14 22:13:10 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux

$cat /etc/os-release
NAME="Oracle Linux Server"
VERSION="8.8"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:8:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.8
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.8

$sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

$cat /proc/modules | grep wireg
wireguard 118784 0 - Live 0x0000000000000000
libchacha20poly1305 20480 1 wireguard, Live 0x0000000000000000
libcurve25519_generic 40960 1 wireguard, Live 0x0000000000000000

nodeInfo:
    architecture: arm64
    bootID: xxxx-yyyy
    containerRuntimeVersion: cri-o://1.28.2-169.el8
    kernelVersion: 5.15.0-105.125.6.2.1.el8uek.aarch64
    kubeProxyVersion: v1.28.2
    kubeletVersion: v1.28.2
    machineID: xxxx-yyyy
    operatingSystem: linux
    osImage: Oracle Linux Server 8.8
bryopsida commented 9 months ago

FWIW some additional details from the test vm.

Wireguard kernel message from node.

$ dmesg | grep wireguard
[   47.109252] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[   47.109254] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.

Iptables version from wireguard container

$ iptables --version
iptables v1.8.9 (legacy)
lorenzomonta commented 9 months ago
$ dmesg | grep wireguard
[  123.254595] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[  123.258710] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
$ iptables --version
iptables v1.8.4 (nf_tables)
lorenzomonta commented 9 months ago

I opened a ticket to the Oracle support, but they haven't responded yet. Once they respond, if they resolve, I will post the solution so that if anyone else in the future has problems they can read the solution.

bryopsida commented 9 months ago

Awesome thank you! I have a suspicion this https://github.com/bryopsida/wireguard-chart/issues/40 may be a workaround in this case but haven't had time to dig into yet.

lorenzomonta commented 9 months ago

Thank you! In my opinion the problem is the version of iptables 1.8.4 instead of 1.8.9, let's see what technical support says. Also, connecting to the node I see that the latest version of iptables that can be updated is 1.8.5-10.0.1.el8_9 which is not 1.8.9, maybe the operating system also needs to be updated as too.

lorenzomonta commented 9 months ago

@bryopsida in the test you did, you put oracle linux 8.8, how did you put iptables version 1.8.9 on it when on oracle linux 9.3 there is at most version 1.8.8?

bryopsida commented 9 months ago

@bryopsida in the test you did, you put oracle linux 8.8, how did you put iptables version 1.8.9 on it when on oracle linux 9.3 there is at most version 1.8.8?

That was from inside the container using kubectl exec -it <pod name> -- /bin/sh to get an interactive shell inside the container.

lorenzomonta commented 9 months ago

I can't becouse the pod is in CashLoopBackOff state. I mean what is the iptables version of the Oracle Linux node? Tnks

bryopsida commented 9 months ago

Version on the node was v1.8.4 (nf_tables).

lorenzomonta commented 8 months ago

@bryopsida I found this: https://docs.linuxserver.io/images/docker-wireguard/#application-setup, in particular: "Note on iptables:

Some hosts may not load the iptables kernel modules by default. In order for the container to be able to load them, you need to assign the SYS_MODULE capability and add the optional /lib/modules volume mount. Alternatively you can modprobe them from the host before starting the container."

lorenzomonta commented 8 months ago

Solved by running the command directly on the oke node: "sudo modprobe iptable_nat sudo modprobe ip6table_nat"

lorenzomonta commented 8 months ago

But, if I reboot the node I've still the issues.

lorenzomonta commented 8 months ago

@bryopsida the other possility is to mount volume: "/lib/modules" on the pod before it starts and add the SYS_MODULE capacity, how can this be achieved?

bryopsida commented 8 months ago

@bryopsida the other possility is to mount volume: "/lib/modules" on the pod before it starts and add the SYS_MODULE capacity, how can this be achieved?

You could patch the deployment obj created by the helm release to add the capability to these blocks.

          add:
            - NET_ADMIN
            - NET_RAW
            - SETUID
            - SETGID

And modify the volume + volumeMounts to provide a hostpath volume. https://kubernetes.io/docs/concepts/storage/volumes/#hostpath

But instead of giving the container more access to the node you may prefer to use an init script on the node.

I'm not familiar with OKE but in AWS you can define scripts/cloud-init directives to the auto scaling groups for nodes that run on startup of the nodes, is this something you can do in OKE?

I can add a toggle to enable that behavior in a new chart release/version for those who want to opt into allowing the container to load kernel modules.

lorenzomonta commented 8 months ago

In the end, the solution I adopted is as follows:

How to load the kernel modules iptable_nat and ip6table_nat at boot:

Create the file iptable_nat.modules under the path: /etc/sysconfig/modules/ Give permissions with chmod +x iptable_nat.modules

$ cat /etc/sysconfig/modules/iptable_nat.modules modprobe iptable_nat modprobe ip6table_nat

Restart the OKE node and test if the modules are loaded correctly

$ lsmod | grep iptable_nat iptable_nat 16384 0 ip_tables 40960 1 iptable_nat nf_nat 61440 5 ip6table_nat,xt_nat,nft_chain_nat,iptable_nat,xt_MASQUERADE

@bryopsida thank you so much for your support and patience!

lorenzomonta commented 8 months ago

@bryopsida last question: how should i configure healt check for the network load balancer ? is it possible to use http protocol, one port and one url? thanks

bryopsida commented 8 months ago

@bryopsida last question: how should i configure healt check for the network load balancer ? is it possible to use http protocol, one port and one url? thanks

This would be the health check from the oracle UDP load balancer?

There really isn't something exposed at the external network level for that currently. The pod itself uses an exec probe to check the wg status.

I've created: https://github.com/bryopsida/wireguard-chart/issues/43 to add a sidecar for this.

lorenzomonta commented 8 months ago

This would be the health check from the oracle UDP load balancer?

Yes, its name is network load balancer (NLB) becouse a UDP load balancer doesn't exit on Oracle Cloud