Closed KavyaPuranik closed 2 years ago
Hi @KavyaPuranik,
May I get some details on your use-case?
I am assuming you are using https://github.com/NVIDIA/ais-k8s/blob/master/terraform/deploy.sh script to bring up a K8s cluster on GCP, followed by AIS deployment. This script is a bit outdated and uses deprecated helm charts to deploy AIStore. Preferred way to deploy AIStore is to use the operator approach https://github.com/NVIDIA/ais-k8s/tree/master/operator. AIStore v3.4 and v3.8 are both deprecated and we don't support these versions anymore. We suggest you use release v3.10.
If you would like to deploy AIStore v3.10 version on GCP, you can try running:
$ ./deploy.sh k8s YOUR-ARGS
$ kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole=cluster-admin \
--user=$(gcloud config get-value core/account)
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.2/cert-manager.yaml
# Verify cert-manager pods are in Running state. Ref: https://cert-manager.io/docs/installation/kubectl/#prerequisites for more information.
$ kubectl get pods --namespace cert-manager
# Deploy operator
$ kubectl apply -f https://github.com/NVIDIA/ais-k8s/releases/download/v0.92/ais-operator.yaml
# Ensure ais operator is running
$ kubectl get pods --namespace ais-operator-system
Once you have your operator pod running, you can apply the AIStore cluster manifest. You may use sample manifest https://github.com/NVIDIA/ais-k8s/blob/master/operator/config/samples/ais_v1beta1_aistore.yaml as reference and update the cluster name, size and any other specs as applicable.
apiVersion: ais.nvidia.com/v1beta1
kind: AIStore
metadata:
name: <NAME>
spec:
# Add fields here
size: <NUMBER-OF-TARGETS>
...
targetSpec:
…
mounts:
- path: "/ais1"
size: <SIZE OF VOLUME>
<LIST OF MORE VOLUMES>
Please let me know if the operator approach and deploying v3.10 works for you. We will update the docs and scripts to use the operator approach.
Hi @saiprashanth173,
I was able to successfully deploy the ais-node 3.10v using AIS-operators. Initially I wasn't aware that helm charts were outdated. So deployed the default version(3.4) and then tried to upgrade it, which caused the above mentioned issue.
Thanks for checking @KavyaPuranik.
Closing this ticket. The docs in https://github.com/NVIDIA/ais-k8s repo will be updated to reflect latest releases.
Hi,
I was following https://github.com/NVIDIA/ais-k8s terraform style of deploying the ais application on to k8s cluster. I initially deployed the cluster with 3.4 version, which was successful & then when I tried to upgrade it to 3.8 version(using the same script), the pods went to crashloopbackoff status. I had to manually clean the state & config files to make it work for 3.8 version. Any inputs on how to resolve this issue without having to clean the files manually?