Closed andy108369 closed 4 months ago
Here is what we found so far from our testing.
We will continue testing further and will report new findings.
@jigar-arc10 - thank you for the additional testing.
Thoughts on some of the points raised above:
These scripts required root access to the server.
Current Akash Provider documentation and install process assumes install is being run as root as stated here:
As this is part of pre-existing methodologies - do not view this as an issue - but please let us know if you feel otherwise and/or if it will provoke issues in Praetor use.
The recommended OS is Ubuntu. It failed for Debian at the ingress-nginx installation stage.
Current Akash Provider > Helm install based instructions recommend/assume Ubuntu use as stated here:
Based on this being part of the pre-existing standard - do not believe this is an issue but please let us know if you feel otherwise and/or if this may cause issues for Praetor users.
While scaling down a node, we tried to use the draining method, but akash-services/operator-inventory-hardware-discovery will cause an issue as it is not a DeamonSet. We should look into it. Force draining worked.
Will look into this issue further. Initial testing of scaling down procedure only tested the ability to scale down K3s nodes. Have not yet tested scaling down with Akash provider and related operators installed. Will test those scenarios ASAP.
@chainzero - Thanks for the response.
As this is part of pre-existing methodologies - do not view this as an issue - but please let us know if you feel otherwise and/or if it will provoke issues in Praetor use.
After deep consideration, we agree that root user access should be required as it also helps with GPU driver installation steps.
Based on this being part of the pre-existing standard - do not believe this is an issue but please let us know if you feel otherwise and/or if this may cause issues for Praetor users.
It's a non-issue.
Will look into this issue further. Initial testing of scaling down procedure only tested the ability to scale down K3s nodes. Have not yet tested scaling down with Akash provider and related operators installed. Will test those scenarios ASAP.
After many iterations of testing regarding node removal with updated scripts, the issue about operator-inventory-hardware is gone, and the node was successfully removed.
Here are the considerations which can be while using k3s instead of k8s.
CNI plugins/calico: Consider installation scenario where one would want to specify K8s internal networking as well, primarily for the performance sake (for internal K8s services/apps communication, including Rook-Ceph persistent storage which can be really heavy on the traffic if it is not done via the internal networking which will lead to significant performance lag and bill if provider's traffic is metered)
In the K3S setup, we use the default Calico CNI plugin provided by k3s to ensure high performance for internal networking. This configuration is essential for optimizing communication between Kubernetes services and applications, especially for high-traffic services like Rook-Ceph, to prevent significant performance lag and avoid metered external traffic costs.
We verify that Calico is installed and running in our k3s cluster.
root@node1:~# kubectl get pods -n kube-system -l k8s-app=calico-node
NAME READY STATUS RESTARTS AGE
calico-node-plt4k 1/1 Running 0 4h57m
To define an IP pool for internal networking and ensure efficient internal communication, we use the following configuration:
root@node1:~# kubectl get ippool
NAME AGE
default-ipv4-ippool 9h
root@node1:~# kubectl describe ippool default-ipv4-ippool
Name: default-ipv4-ippool
Namespace:
Labels: <none>
Annotations: projectcalico.org/metadata: {"uid":"cf9f2f1f-c77e-463e-8574-d9b6ea72d055","creationTimestamp":"2024-07-16T16:56:14Z"}
API Version: crd.projectcalico.org/v1
Kind: IPPool
Metadata:
Creation Timestamp: 2024-07-16T16:56:14Z
Generation: 1
Resource Version: 712
UID: b3def60d-9f8b-46d8-9ff8-42c1de61412a
Spec:
Allowed Uses:
Workload
Tunnel
Block Size: 26
Cidr: 192.168.0.0/16
Ipip Mode: Always
Nat Outgoing: true
Node Selector: all()
Vxlan Mode: Never
Events: <none>
Define Network Policies (If needed) We create network policies to manage traffic flow and ensure internal communication is optimized for performance.
kubectl apply -f - <<EOF
apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: allow-rook-ceph
namespace: rook-ceph
spec:
selector: all()
ingress:
- action: Allow
source:
namespaceSelector: has(role)
selector: app == 'rook-ceph'
egress:
- action: Allow
destination:
namespaceSelector: has(role)
selector: app == 'rook-ceph'
EOF
nodefs
& imagefs
locationscustomize nodefs & imagefs locations: similarly to how it's described here
To manage storage effectively, we can customize the locations for nodefs
and imagefs
in k3s. This involves setting custom data directories and configuring containerd, the container runtime used by k3s.
At this point we imagine, we created RAID0 over 2 NVME using the following commands:
root@node1:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 63.9M 1 loop /snap/core20/2318
loop1 7:1 0 25.2M 1 loop /snap/amazon-ssm-agent/7993
loop2 7:2 0 87M 1 loop /snap/lxd/28373
loop3 7:3 0 55.7M 1 loop /snap/core18/2829
loop4 7:4 0 38.8M 1 loop /snap/snapd/21759
nvme0n1 259:0 0 80G 0 disk
├─nvme0n1p1 259:1 0 79.9G 0 part /
├─nvme0n1p14 259:2 0 4M 0 part
└─nvme0n1p15 259:3 0 106M 0 part /boot/efi
nvme1n1 259:4 0 100G 0 disk
nvme2n1 259:5 0 100G 0 disk
root@node1:~# mdadm --create /dev/md0 --level=raid0 --metadata=1.2 --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1
mdadm: array /dev/md0 started.
root@node1:~# cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 nvme2n1[1] nvme1n1[0]
209582080 blocks super 1.2 512k chunks
unused devices: <none>
root@node1:~# mkfs.ext4 /dev/md0
mke2fs 1.46.5 (30-Dec-2021)
Creating filesystem with 52395520 4k blocks and 13099008 inodes
Filesystem UUID: b1ea6725-0d38-42d2-a9c8-3071d8c7c5de
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
root@node1:~# cp -p /etc/fstab /etc/fstab.1
root@node1:~# cat >> /etc/fstab << EOF
UUID="$(blkid /dev/md0 -s UUID -o value)" /data ext4 defaults,discard 0 0
EOF
root@node1:~# diff -Nur /etc/fstab.1 /etc/fstab
--- /etc/fstab.1 2024-07-01 15:42:56.210521795 +0000
+++ /etc/fstab 2024-07-17 04:07:18.985153190 +0000
@@ -1,2 +1,3 @@
LABEL=cloudimg-rootfs / ext4 discard,errors=remount-ro 0 1
LABEL=UEFI /boot/efi vfat umask=0077 0 1
+UUID="28b606d9-6e43-4a0b-be60-c7cda95b71e4" /data ext4 defaults,discard 0 0
root@node1:~# mkdir /data
mount /data
root@node1:~# df -Ph /data
Filesystem Size Used Avail Use% Mounted on
/dev/md0 196G 28K 186G 1% /data
root@node1:~# /usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf
root@node1:~# cat /etc/mdadm/mdadm.conf | grep -v ^\#
HOMEHOST <system>
MAILADDR root
ARRAY /dev/md/0 metadata=1.2 UUID=1e921d7f:4b06d544:42f0e25f:a252e4e1 name=ip-172-31-47-75:0
root@node1:~# update-initramfs -c -k all
update-initramfs: Generating /boot/initrd.img-6.5.0-1022
Setting up k3s with custom location of k3s
--data-dir
option. This ensures that all k3s-related data, including nodefs
and imagefs
, are stored in the specified directory.curl -sfL https://get.k3s.io | sh -s - --data-dir /custom/path/to/k3s/data
sudo vi /etc/rancher/k3s/config.toml
Add the following configuration:
[plugins."io.containerd.grpc.v1.cri".containerd]
root = "/custom/path/to/containerd/root"
state = "/custom/path/to/containerd/state"
sudo systemctl restart k3s
Verifying the Configuration:
ls /custom/path/to/containerd/root
sudo ctr -n k8s.io containers list
nodefs
and imagefs
are being utilized as expected.df -h /custom/path/to/containerd/root
Moving running k3s to a new mounted volume
To move your existing k3s setup to use a new mounted volume at /data
, follow these steps:
sudo systemctl stop k3s
sudo rsync -a /var/lib/rancher/k3s/ /data/k3s/
sudo vi /etc/systemd/system/k3s.service
Update the ExecStart
line:
ExecStart=/usr/local/bin/k3s server --data-dir /data/k3s
Reload the systemd configuration:
sudo systemctl daemon-reload
Ensure that kubelet is configured to use the correct paths. If the kubelet configuration file exists, it might need to be updated.
sudo vi /etc/rancher/k3s/config.yaml
Confirm or add the following configurations, if necessary:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
rootDir: "/data/k3s"
sudo systemctl start k3s
kubectl describe node node1
ls /data/k3s
ls /data/k3s/agent/containerd
consider etcd backup & restore procedure (kubespray does this automatically each time you run it against your K8s cluster) The way K3s is backed up and restored depends on the type of datastore being used. Below are the procedures for backing up and restoring K3s with SQLite, an external datastore, and embedded etcd.
Backup and Restore with SQLite
Backup No special commands are required to back up the SQLite datastore. To back up the SQLite datastore, take a copy of the following directory:
/var/lib/rancher/k3s/server/db/
Restore To restore the SQLite datastore, restore the contents of the directory mentioned above, and also restore the server token file:
/var/lib/rancher/k3s/server/token
The token file must be restored or its value must be passed into the --token
option when restoring from backup. If you do not use the same token value when restoring, the snapshot will be unusable, as the token is used to encrypt confidential data within the datastore itself.
Backup and Restore with Embedded etcd Datastore
K3s offers a robust mechanism for backing up and restoring the embedded etcd datastore.
Automated Snapshot Creation By default, k3s is configured to automatically create snapshots of the etcd datastore twice daily, at 00:00 and 12:00 system time. These snapshots ensure that we have recent backups of your cluster state. The snapshots are retained in the ${data-dir}/server/db/snapshots directory, which defaults to /var/lib/rancher/k3s/server/db/snapshots. K3s retains the five most recent snapshots, but configuration options can adjust this number.
We can customize the snapshot frequency and retention using the following options:
For embedded etcd, we can use the k3s etcd-snapshot
command for backup and restore operations.
Backup To perform an on-demand snapshot of the etcd datastore, we use the following command:
k3s etcd-snapshot save
This command will create a snapshot and save it to the default location /var/lib/rancher/k3s/server/db/snapshots/
. We can specify a custom directory and name for the snapshot as well:
k3s etcd-snapshot save --name my-snapshot --dir /path/to/backup/
Restore
To restore from a snapshot, follow these steps:
Stop the k3s server:
systemctl stop k3s
Restore the snapshot:
k3s etcd-snapshot restore --name snapshot-<timestamp> --dir /path/to/backup/
Start the k3s server:
systemctl start k3s
consider etcd performance - AFAIK, k3s uses sqlite3 DB for the etcd; so there should be some quick perf test for it such as etcdctl check perf we have here
root@node1:~# export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS="https://127.0.0.1:2379"
export ETCDCTL_CACERT="/data/k3s/server/tls/etcd/server-ca.crt"
export ETCDCTL_CERT="/data/k3s/server/tls/etcd/server-client.crt"
export ETCDCTL_KEY="/data/k3s/server/tls/etcd/server-client.key"
root@node1:~# etcdctl -w table member list
+------------------+---------+-------------------------+--------------------------+--------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+-------------------------+--------------------------+--------------------------+
| 34c66c9fb119f95a | started | ip-172-31-39-9-c9a36ec6 | https://172.31.39.9:2380 | https://172.31.39.9:2379 |
+------------------+---------+-------------------------+--------------------------+--------------------------+
root@ip-172-31-39-9:~# etcdctl endpoint health --cluster -w table
+--------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+--------------------------+--------+------------+-------+
| https://172.31.39.9:2379 | true | 1.858019ms | |
+--------------------------+--------+------------+-------+
root@node1:~# etcdctl endpoint status --cluster -w table
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://172.31.39.9:2379 | 34c66c9fb119f95a | 3.5.13 | 4.4 MB | true | 2 | 17248 |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
root@node1:~# etcdctl check perf
59 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooom ! 98.33%PASS: Throughput is 151 writes/s
PASS: Slowest request took 0.021458s
PASS: Stddev is 0.001235s
PASS
custom K8s configs for the nodefs & imagefs thresholds (ref)
To customize disk usage thresholds for nodefs
and imagefs
, we can modify the kubelet configuration. The kubelet has parameters that allow us to specify eviction thresholds based on filesystem usage.
Example Configuration
Here’s an example of how to configure custom thresholds in the kubelet configuration file:
Edit the Kubelet Configuration File:
Open the kubelet configuration file in your preferred text editor and add the custom thresholds:
sudo vi /var/lib/kubelet/config.yaml
Add the configuration as shown
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
nodefs.available: "10%"
imagefs.available: "15%"
nodefs.inodesFree: "5%"
imagefs.inodesFree: "10%"
Restart the k3s service:
After modifying the configuration file, restart the k3s service to apply the changes:
sudo systemctl restart k3s
Monitor Node Conditions:
Use kubectl
to monitor the node conditions and ensure that the eviction thresholds are being respected:
root@node1:~# kubectl describe node
Name: ip-172-31-47-75
Roles: control-plane,etcd,master
Labels: akash.network=true
....
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 16 Jul 2024 16:56:14 +0000 Tue, 16 Jul 2024 16:56:14 +0000 CalicoIsUp Calico is running on this node
EtcdIsVoter True Wed, 17 Jul 2024 03:35:23 +0000 Tue, 16 Jul 2024 16:55:19 +0000 MemberNotLearner Node is a voting member of the etcd cluster
MemoryPressure False Wed, 17 Jul 2024 03:35:58 +0000 Tue, 16 Jul 2024 16:55:04 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 17 Jul 2024 03:35:58 +0000 Tue, 16 Jul 2024 16:55:04 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 17 Jul 2024 03:35:58 +0000 Tue, 16 Jul 2024 16:55:04 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 17 Jul 2024 03:35:58 +0000 Tue, 16 Jul 2024 21:39:21 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
custom K8s configs for the max. number of container log files that can be present for a container kubelet_logfiles_max_nr, as well as the max.size of the container log file before it is rotated kubelet_logfiles_max_size (ref)
We can manage custom Kubernetes configurations for the maximum number of container log files and the maximum size of a container log file before it is rotated by configuring the kubelet parameters. These settings help control the disk usage on nodes by limiting the number of log files and their sizes.
Customizing Kubelet Configuration in k3s
To set kubelet_logfiles_max_nr
(maximum number of log files) and kubelet_logfiles_max_size
(maximum size of log files), we follow these steps:
Create a Kubelet Configuration File:
Create a configuration file for the kubelet if it doesn't already exist.
sudo mkdir -p /etc/rancher/k3s
sudo touch /etc/rancher/k3s/config.yaml
Edit the Kubelet Configuration File:
Add the following configuration to set the maximum number of log files and the maximum size of log files.
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxContainerLogFiles: 5 # kubelet_logfiles_max_nr
containerLogMaxSize: "10Mi" # kubelet_logfiles_max_size
This configuration sets the maximum number of log files per container to 5 and the maximum size of each log file to 10MiB.
Configure k3s to Use the Custom Kubelet Configuration:
Modify the k3s service file to point to the custom kubelet configuration file. This file is typically located at /etc/systemd/system/k3s.service
or /etc/systemd/system/k3s-agent.service
for k3s agents.
Edit the service file to include the custom kubelet configuration.
sudo vi /etc/systemd/system/k3s.service
Add the following line to the ExecStart section to use the custom kubelet configuration:
ExecStart=/usr/local/bin/k3s server --kubelet-arg=config=/etc/rancher/k3s/config.yaml
For k3s agents, it would look like:
ExecStart=/usr/local/bin/k3s agent --kubelet-arg=config=/etc/rancher/k3s/config.yaml
Reload and Restart the k3s Service:
Reload the systemd configuration and restart the k3s service to apply the changes.
sudo systemctl daemon-reload
sudo systemctl restart k3s
Verify the Configuration:
After restarting the k3s service, verify that the kubelet is using the new configuration.
root@node1:~# kubectl describe node ip-172-31-47-75
Name: ip-172-31-47-75
Roles: control-plane,etcd,master
Labels: akash.network=true
beta.kubernetes.io/arch=amd64
...
Annotations: alpha.kubernetes.io/provided-node-ip: 172.31.47.75
k3s.io/node-args:
["server","--apiVersion","kubelet.config.k8s.io/v1beta1","--kind","KubeletConfiguration","--maxContainerLogFiles","5","--containerLogMaxSize","10Mi"]
projectcalico.org/IPv4Address: 172.31.47.75/20
Great job @devalpatel67 @jigar-arc10 and @chainzero !
FWIW, k3s upgrades seem to be straightforward: https://docs.k3s.io/upgrades/manual
@chainzero created the k3s method of provider installation, described here https://akashengineers.xyz/provider-build-scripts
Before getting this to the Production use the following points must be considered, addressed/verified to be supported with the k3s K8s cluster deployment method:
etcd
can be scaled (to avoid SPOF)control-plane
can be scaledetcd
instance or/andcontrol-plane
);etcd
instance or/andcontrol-plane
);nodefs
&imagefs
locations: similarly to how it's described hereetcd
backup & restore procedure (kubespray does this automatically each time you run it against your K8s cluster)etcd
performance - AFAIK, k3s uses sqlite3 DB for the etcd; so there should be some quick perf test for it such asetcdctl check perf
we have hereAdditioanlly/Ideally
nodefs
&imagefs
thresholds (ref)kubelet_logfiles_max_nr
, as well as the max.size of the container log file before it is rotatedkubelet_logfiles_max_size
(ref)