Open valentin-nasta opened 1 month ago
By default or script? I think the images may already be there. I need to test a cluster tonight/tomorrow. Actually it should bne fairly easy to add to the script.
So good news. All the images are included already. I was able to go into rancher and use the catalog for Monitoring and everything worked.
From https://github.com/clemenko/rke_airgap_install/blob/main/hauler_all_the_things.sh#L489 --set useBundledSystemChart=true
tells rancher to use the charts locally. And since all the images are already stored in hauler everything works.
is there something more that you are looking for?
Thank you for the quick reply.
By default or script?
By default would be nice, if there is some kind of Rancher activation of the monitoring similar to the govmessage. Otherwise, adding it to the script would also work fine. The scenario is to have the system already prepared and delivered to the customer without needing to fiddle with the setup afterward.
I also discovered which Helm chart is actually being used by inspecting the UI (rancher-monitoring-103.1.1-up45.31.1.tgz). Initially, I thought it was this one: kube-prometheus-stack.
I tried installing it "manually," but it fails. Do you have any idea why this might happen?
helm upgrade --install=true --namespace=cattle-monitoring-system --timeout=10m0s --values=/home/shell/helm/values-rancher-monitoring-103.1.1-up45.31.1.yaml --version=103.1.1+up45.31.1 --wait=true rancher-monitoring /home/shell/helm/rancher-monitoring-103.1.1-up45.31.1.tgz
Release "rancher-monitoring" does not exist. Installing it now.
Starting delete for "rancher-monitoring-admission" ServiceAccount
Ignoring delete failure for "rancher-monitoring-admission" /v1, Kind=ServiceAccount: serviceaccounts "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" ClusterRole
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=ClusterRole: clusterroles.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" ClusterRoleBinding
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" Role
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=Role: roles.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission" RoleBinding
Ignoring delete failure for "rancher-monitoring-admission" rbac.authorization.k8s.io/v1, Kind=RoleBinding: rolebindings.rbac.authorization.k8s.io "rancher-monitoring-admission" not found
creating 1 resource(s)
Starting delete for "rancher-monitoring-admission-create" Job
Ignoring delete failure for "rancher-monitoring-admission-create" batch/v1, Kind=Job: jobs.batch "rancher-monitoring-admission-create" not found
creating 1 resource(s)
Watching for changes to Job rancher-monitoring-admission-create with timeout of 10m0s
Add/Modify event for rancher-monitoring-admission-create: ADDED
rancher-monitoring-admission-create: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
Add/Modify event for rancher-monitoring-admission-create: MODIFIED
rancher-monitoring-admission-create: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: failed pre-install: 1 error occurred:
* timed out waiting for the condition
full log: helm-operation-v68w4_undefined.log
I think found the root cause on the error:
kubectl -n cattle-monitoring-system get job rancher-monitoring-admission-create
NAME COMPLETIONS DURATION AGE
rancher-monitoring-admission-create 0/1 93m 93m
kubectl -n cattle-monitoring-system get pod --selector=job-name=rancher-monitoring-admission-create
NAME READY STATUS RESTARTS AGE
rancher-monitoring-admission-create-snvlv 0/1 ImagePullBackOff 0 91m
kubectl -n cattle-monitoring-system get pod --selector=job-name=rancher-monitoring-admission-create -oyaml | grep image
image: 192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
imagePullPolicy: IfNotPresent
- image: 192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6
imageID: ""
message: Back-off pulling image "192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6"
If you want the https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack you will have to add the images itself. Right now out of the box, all the images you need is there for the Rancher Monitoring App. You can deploy Rancher and install from the catalog. I will look into adding it from a curl shortly.
After looking into this there is no easy way to do this. The charts they are using are backed in. The chart versions are also hard coded. The simplest way is to use the GUI for deploying it.
Thank you for taking a look on it! Even using the GUI it felt short with the error from the previous comment. I need to troubleshoot it and make sure to load the images beforehand.
I was not able to reproduce the error. Did you deploy rancher with the script?
Yes, I deployed rancher with the script, with these versions:
export RKE_VERSION=1.28.12
export CERT_VERSION=v1.15.3
export RANCHER_VERSION=v2.8.5
export LONGHORN_VERSION=v1.7.0
export NEU_VERSION=2.7.7
I think am getting closer, there is some version mismatch somewhere:
hauler store info | grep mirrored-ingress-nginx-kube
| rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20230312-helm-chart-4.5.2-28-g66a760794 | image | linux/amd64 | 2 | 20.1 MB |
vs
message: Back-off pulling image "192.168.100.107:5000/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20221220-controller-v1.5.1-58-g787ea74b6"
I think I know what is going on. Updating the script now for it.
I was checking the possibility to enable the Monitoring Application in an airgapped environment according to this documentation: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/monitoring-alerting-guides/enable-monitoring
Is this a separate Helm chart, or is it part of the existing stack? How would this setup look when integrated inside the rke_airgap_install script? Any tips or guidance on configuring this in an airgapped environment would be greatly appreciated.
Thank you!