Open andy108369 opened 2 months ago
Here is what we found so far from our testing.
We will continue testing further and will report new findings.
@jigar-arc10 - thank you for the additional testing.
Thoughts on some of the points raised above:
These scripts required root access to the server.
Current Akash Provider documentation and install process assumes install is being run as root as stated here:
As this is part of pre-existing methodologies - do not view this as an issue - but please let us know if you feel otherwise and/or if it will provoke issues in Praetor use.
The recommended OS is Ubuntu. It failed for Debian at the ingress-nginx installation stage.
Current Akash Provider > Helm install based instructions recommend/assume Ubuntu use as stated here:
Based on this being part of the pre-existing standard - do not believe this is an issue but please let us know if you feel otherwise and/or if this may cause issues for Praetor users.
While scaling down a node, we tried to use the draining method, but akash-services/operator-inventory-hardware-discovery will cause an issue as it is not a DeamonSet. We should look into it. Force draining worked.
Will look into this issue further. Initial testing of scaling down procedure only tested the ability to scale down K3s nodes. Have not yet tested scaling down with Akash provider and related operators installed. Will test those scenarios ASAP.
@chainzero - Thanks for the response.
As this is part of pre-existing methodologies - do not view this as an issue - but please let us know if you feel otherwise and/or if it will provoke issues in Praetor use.
After deep consideration, we agree that root user access should be required as it also helps with GPU driver installation steps.
Based on this being part of the pre-existing standard - do not believe this is an issue but please let us know if you feel otherwise and/or if this may cause issues for Praetor users.
It's a non-issue.
Will look into this issue further. Initial testing of scaling down procedure only tested the ability to scale down K3s nodes. Have not yet tested scaling down with Akash provider and related operators installed. Will test those scenarios ASAP.
After many iterations of testing regarding node removal with updated scripts, the issue about operator-inventory-hardware is gone, and the node was successfully removed.
@chainzero created the k3s method of provider installation, described here https://akashengineers.xyz/provider-build-scripts
Before getting this to the Production use the following points must be considered, addressed/verified to be supported with the k3s K8s cluster deployment method:
etcd
can be scaled (to avoid SPOF)control-plane
can be scaledetcd
instance or/andcontrol-plane
);etcd
instance or/andcontrol-plane
);nodefs
&imagefs
locations: similarly to how it's described hereetcd
backup & restore procedure (kubespray does this automatically each time you run it against your K8s cluster)etcd
performance - AFAIK, k3s uses sqlite3 DB for the etcd; so there should be some quick perf test for it such asetcdctl check perf
we have hereAdditioanlly/Ideally
nodefs
&imagefs
thresholds (ref)kubelet_logfiles_max_nr
, as well as the max.size of the container log file before it is rotatedkubelet_logfiles_max_size
(ref)