Closed Hastyrush closed 11 months ago
Thanks for posting the issue!
The problem you're seeing is due to the empty values in the config for:
"port_intra_control": "",
"port_intra_data": ""
In a typical deployment these are set here via environment variables and have defaults if those are not set:
"host_net": {
"hostname": "${HOSTNAME_LIST}",
"hostname_intra_control": "${HOSTNAME_LIST_INTRA_CONTROL}",
"hostname_intra_data": "${HOSTNAME_LIST_INTRA_DATA}",
"port": "${PORT:-8080}",
"port_intra_control": "${PORT_INTRA_CONTROL:-9080}",
"port_intra_data": "${PORT_INTRA_DATA:-10080}"
},
However, our terraform scripts still use the somewhat outdated helm scripts here https://github.com/NVIDIA/ais-k8s/tree/master/helm/ais which don't set those values when generating the config, hence the parsing error.
Since you've already got the k8s cluster running, I would suggest trying to deploy directly with the k8s operator https://github.com/NVIDIA/ais-k8s/blob/master/operator/README.md as that is fully compatible with more recent versions.
Hello,
Thanks for the clarification.
I went ahead to try and deploy the Kubernetes operator after using the Terraform script to deploy kubernetes (./deploy.sh k8s)
I used the command IMG=aistore/ais-operator:latest make deploy in the documentations and managed to deploy the cert-manager as well as the ais-operator-system namespace. However, when running _kubectl apply -f config/samples/ais_v1beta1aistore.yaml -n ais-operator-system, the container images are unable to start. I will paste the errors encountered below.
'/var/ais_config/ais.json' -> '/etc/ais/ais.json' '/var/ais_config/ais_local.json' -> '/etc/ais/ais_local.json' '/var/statsd_config/statsd.json' -> '/opt/statsd/statsd.conf' /ais_docker_start.sh: line 13: node: command not found No cached .ais.smap aisnode args: -config=/etc/ais/ais.json -local_config=/etc/ais/ais_local.json -role=proxy -allow_shared_no_disks=false -ntargets=1 E 08:02:51.405993 daemon:151 FATAL ERROR: failed to load initial global config "/etc/ais/ais.json": cmn.ClusterConfig.Mirror: cmn.MirrorConf.ReadObject: found unknown field: util_thresh, error found in #10 byte of ...|ilthresh":0,"burst|..., bigger context ...|{"backend":null,"mirror":{"copies":2,"util_thresh":0,"burst_buffer":512,"optimize_put":false,"enable|...
FATAL ERROR: failed to load initial global config "/etc/ais/ais.json": cmn.ClusterConfig.Mirror: cmn.MirrorConf.ReadObject: found unknown field: util_thresh, error found in #10 byte of ...|ilthresh":0,"burst|..., bigger context ...|{"backend":null,"mirror":{"copies":2,"util_thresh":0,"burst_buffer":512,"optimize_put":false,"enable|... cat: /var/log/ais/aisnode.INFO: No such file or directory cat: /var/log/ais/aisnode.ERROR: No such file or directory cat: /var/log/ais/aisnode.WARNING: No such file or directory
Same error for the ais-sample-target-0 pod as well. Seems like the 'util_thresh' field in the json config file is not recognized. Am I using the wrong image? The _ais_v1beta1aistore.yaml is as follows:
Thanks a lot for the help!
Hello, thanks for the reply.
I tried pulling the latest aistorenode and operator version, but I'm still getting the same error 'unknown field: util_thresh'
This is the config file that I used
This util_thresh
knob was removed almost 18 months ago:
global config & BMD v2: simplify mirroring configuration
By the time, we'd already stopped using helm
and transitioned to Kubernetes operator. I've just (belatedly!) added a text and a warning at https://github.com/NVIDIA/ais-k8s.
Got it, thanks!
Hello,
I was experimenting with the ais-k8s repository and encountered an issue when trying to deploy AIS as pods in Kubernetes on GCP. Since the ais-k8s repository does not have issue tracking, I am posting it here.
After successfully deploying k8s using the script deploy.sh k8s, I proceeded to deploy the AIS pods with the same script deploy.sh ais --expose-external --aisnode-image=aistorage/aisnode:latest --admin-image=aistorage/admin:latest
The admin pod got successfully deployed, but the aisnode image shows status of CrashLoopBackOff
By inspecting the logs of the container using kubectl logs demo-ais-proxy-0, I get the following errors:
Defaulted container "ais" out of: ais, populate-env (init)
aisnode proxy container startup at Fri Aug 25 09:32:36 UTC 2023
'/var/ais_config/ais.json' -> '/etc/ais/ais.json' '/var/ais_config/ais_local.json' -> '/etc/ais/ais_local.json' '/var/statsd_config/statsd.json' -> '/opt/statsd/statsd.conf' /ais_docker_start.sh: line 13: node: command not found No cached .ais.smap aisnode args: -config=/etc/ais/ais.json -local_config=/etc/ais/ais_local.json -role=proxy -allow_shared_no_disks=false -ntargets=2 E 09:32:38.777314 daemon:151 FATAL ERROR: failed to load plain-text local config "/etc/ais/ais_local.json": cmn.LocalConfig.HostNet: cmn.LocalNetConfig.PortIntraData: PortIntraControl: readUint64: unexpected character: , error found in #10 byte of ...|rol": "", "p|..., bigger context ...| "51080", "port_intra_control": "", "port_intra_data": "" } }|...
FATAL ERROR: failed to load plain-text local config "/etc/ais/ais_local.json": cmn.LocalConfig.HostNet: cmn.LocalNetConfig.PortIntraData: PortIntraControl: readUint64: unexpected character: , error found in #10 byte of ...|rol": "", "p|..., bigger context ...| "51080", "port_intra_control": "", "port_intra_data": "" } }|... cat: /var/log/ais/aisnode.INFO: No such file or directory cat: /var/log/ais/aisnode.ERROR: No such file or directory cat: /var/log/ais/aisnode.WARNING: No such file or directory
By using the default admin and aisnode image (3.4), the pods runs successfully. Is there a mistake in my deployment or are the new images not yet compatible with k8s? Thanks in advance!