JamesTurland / JimsGarage

Homelab Goodies
1.94k stars 445 forks source link

K3S-Deploy - Suggestions for enhancement #68

Open nmehran opened 7 months ago

nmehran commented 7 months ago

Below are some simple enhancements which will improve the robustness of the k3s.sh script.

End script immediately on error

Note: This will provide a cleaner log as to what caused the error.

Insert at the top of the script:

set -e  # Exit immediately if a command exits with a non-zero status.

Update and upgrade system packages, with lock mitigation support.

Add:

# Update and upgrade system packages, with lock mitigation support
attempt_limit=10
attempt_delay_seconds=3
for ((attempt=1; attempt<=attempt_limit; attempt++)); do
    if sudo apt-get update && sudo apt-get upgrade -y; then
        echo "Package list updated and packages upgraded successfully."
        break # Success
    elif ((attempt == attempt_limit)); then
        echo "Failed to update and upgrade packages within $attempt_limit attempts."
        exit 1 # Failure after all attempts
    else
        echo "Attempt $attempt of $attempt_limit failed. Retrying in $attempt_delay_seconds seconds..."
        sleep $attempt_delay_seconds
    fi
done

Synchronize node NTPs to ensure time synchronization on nodes

Note: k3sup and other downloads may fail if time is not synchronized between VM snapshots, so this is important.

Insert:

# Install policycoreutils for each node
for newnode in "${all[@]}"; do
  ssh $user@$newnode -i ~/.ssh/$certName sudo su <<EOF
  sudo timedatectl set-ntp off  # ***** This has been inserted *****
  sudo timedatectl set-ntp on  # ***** This has been inserted *****
  NEEDRESTART_MODE=a apt install policycoreutils -y
  exit
EOF
  echo -e " \033[32;5mPolicyCoreUtils installed!\033[0m"
done

Add robust wait on "Install Metallb"

Append:

# Step 8: Install Metallb
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yaml
kubectl wait --for=condition=ready pod -l app=metallb --namespace=metallb-system --timeout=300s   # ***** This has been appended *****
JamesTurland commented 6 months ago

Thank you, I will test all of these when I can.

nmehran commented 5 months ago

After some further testing, your original time synchronization method seems to be more robust!

sudo timedatectl set-ntp off
sudo timedatectl set-ntp on

I think synchronizing the time for each node in the for newnode in "${all[@]}"; do loop is the most important improvement we could make, because in my tests, the nodes were failing to install k3sup dependencies without it.

I've gone ahead and edited the above post to reflect these changes.