Closed degola closed 2 months ago
Hi, seems like Alaz could not find a CRI socket to connect to on your nodes.
What is the underlying OS on your nodes? Linux or Windows? We only support Linux machines. If Linux, what container runtime do you use? If you could specify the socket path of the CRI, it'd be great.
This link can help about CRIs(container runtime interfaces.)
I'm using Ubuntu 22.04.4 LTS, but rke2 runs with k3s and containerd, so the CRI socket path is /run/k3s/containerd/containerd.sock
.
The helm chart I'm using (https://github.com/getanteon/anteon-helm-charts/blob/master/charts/alaz/templates/daemonset.yaml) seems to not have support to specify the CRI socket path.
Also, looking further into https://github.com/getanteon/alaz/blob/master/cri/cri.go#L24C5-L24C28 it seems actually to be hard-coded there?
I guess I can put a PR to extend the list as quick-fix but probably a good idea to have it manageable via ENV-vars as well or do you have a better solution?
Managing through ENV-vars in case of hard-coded paths not matching the CRI socket path on the node would be more flexible like you said. If you could send a PR, we can quickly review and release a new version. You can checkout a new branch from develop branch btw.
@kenanfarukcakir PR is in: https://github.com/getanteon/alaz/pull/164
Once merged + released also merge PR for the helm-chart: https://github.com/getanteon/anteon-helm-charts/pull/10
Hi all,
unfortunately this does not seem to be working for me with alaz 0.12.0 (installed via the chart).
The daemonset contains the CRI_RUNTIME_ENDPOINT
environment variable:
$ k get ds alaz-daemonset -o yaml|grep -A1 CRI
- name: CRI_RUNTIME_ENDPOINT
value: unix:///run/k3s/containerd/containerd.sock
$
But the pod nevertheless crashes:
$ k logs alaz-daemonset-khmwf
{"level":"info","tag":"v0.12.0","time":1726058274,"message":"alaz tag"}
{"level":"info","time":1726058274,"message":"k8sCollector initializing..."}
{"level":"error","error":"validate service connection: validate CRI v1 runtime API for endpoint \"unix:///proc/1/root/var/run/cri-dockerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /proc/1/root/var/run/cri-dockerd.sock: connect: no such file or directory\"","time":1726058274,"message":"failed to create cri tool"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x18106fe]
The endpoint seems to be still the default one, not the one set via then environment variable.
Is there just not yet a release of the chart that contains this fix?
Kind Regards, Johannes
Hi all,
unfortunately this does not seem to be working for me with alaz 0.12.0 (installed via the chart).
The daemonset contains the
CRI_RUNTIME_ENDPOINT
environment variable:$ k get ds alaz-daemonset -o yaml|grep -A1 CRI - name: CRI_RUNTIME_ENDPOINT value: unix:///run/k3s/containerd/containerd.sock $
But the pod nevertheless crashes:
$ k logs alaz-daemonset-khmwf {"level":"info","tag":"v0.12.0","time":1726058274,"message":"alaz tag"} {"level":"info","time":1726058274,"message":"k8sCollector initializing..."} {"level":"error","error":"validate service connection: validate CRI v1 runtime API for endpoint \"unix:///proc/1/root/var/run/cri-dockerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /proc/1/root/var/run/cri-dockerd.sock: connect: no such file or directory\"","time":1726058274,"message":"failed to create cri tool"} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x18106fe]
The endpoint seems to be still the default one, not the one set via then environment variable.
Is there just not yet a release of the chart that contains this fix?
Kind Regards, Johannes
I am getting same error.
Try to give unix:///proc/1/root/run/k3s/containerd/containerd.sock
instead of unix:///run/k3s/containerd/containerd.sock
.
Alaz needs proc/1/root prefix to access the cri endpoint on the host. A pr that will automate this would be much welcome.
Try to give
unix:///proc/1/root/run/k3s/containerd/containerd.sock
instead ofunix:///run/k3s/containerd/containerd.sock
. Alaz needs proc/1/root prefix to access the cri endpoint on the host. A pr that will automate this would be much welcome.
I tried it also but the result same.
I removed the k3s installation and tried with kubespray(production ready cluster) on my local. It runs with default configs.
Try to give
unix:///proc/1/root/run/k3s/containerd/containerd.sock
instead ofunix:///run/k3s/containerd/containerd.sock
. Alaz needs proc/1/root prefix to access the cri endpoint on the host. A pr that will automate this would be much welcome.
In my comment I stated that there was a different endpoint specified in the daemonset. But the error only mentions the default one.
To me it looks like whatever I set is not being respected actually.
I have a self-hosted Kubernetes via Rancher (RKE2), version: v1.27.12 +rke2r1.
Anteon itself works fine after deployment, but Alaz crashes with the following output on each individual pod:
I used the following Helm chart to install Alaz:
$(MONITORING_ID)
and$(BACKEND_HOST)
are set properly.Any suggestions/hints for this?