akash-network / support

Akash Support and Issue Tracking
5 stars 3 forks source link

k3s production use considerations (and validation) #217

Open andy108369 opened 2 months ago

andy108369 commented 2 months ago

@chainzero created the k3s method of provider installation, described here https://akashengineers.xyz/provider-build-scripts

Before getting this to the Production use the following points must be considered, addressed/verified to be supported with the k3s K8s cluster deployment method:

Additioanlly/Ideally

jigar-arc10 commented 1 month ago

Here is what we found so far from our testing.

We will continue testing further and will report new findings.

chainzero commented 1 month ago

@jigar-arc10 - thank you for the additional testing.

Thoughts on some of the points raised above:

Current Akash Provider documentation and install process assumes install is being run as root as stated here:

https://akash.network/docs/providers/build-a-cloud-provider/kubernetes-cluster-for-akash-providers/kubernetes-cluster-for-akash-providers/#step-2---install-ansible

As this is part of pre-existing methodologies - do not view this as an issue - but please let us know if you feel otherwise and/or if it will provoke issues in Praetor use.

Current Akash Provider > Helm install based instructions recommend/assume Ubuntu use as stated here:

https://akash.network/docs/providers/build-a-cloud-provider/kubernetes-cluster-for-akash-providers/kubernetes-cluster-for-akash-providers/#kubernetes-cluster-softwarehardware-requirements-and-recommendations

Based on this being part of the pre-existing standard - do not believe this is an issue but please let us know if you feel otherwise and/or if this may cause issues for Praetor users.

Will look into this issue further. Initial testing of scaling down procedure only tested the ability to scale down K3s nodes. Have not yet tested scaling down with Akash provider and related operators installed. Will test those scenarios ASAP.

jigar-arc10 commented 1 month ago

@chainzero - Thanks for the response.

As this is part of pre-existing methodologies - do not view this as an issue - but please let us know if you feel otherwise and/or if it will provoke issues in Praetor use.

After deep consideration, we agree that root user access should be required as it also helps with GPU driver installation steps.

Based on this being part of the pre-existing standard - do not believe this is an issue but please let us know if you feel otherwise and/or if this may cause issues for Praetor users.

It's a non-issue.

Will look into this issue further. Initial testing of scaling down procedure only tested the ability to scale down K3s nodes. Have not yet tested scaling down with Akash provider and related operators installed. Will test those scenarios ASAP.

After many iterations of testing regarding node removal with updated scripts, the issue about operator-inventory-hardware is gone, and the node was successfully removed.