Open ngoeddel-openi opened 4 months ago
Hi @ngoeddel-openi,
Could you please run the following command and share its output?
kubectl get pods -o 'custom-columns=OWNER:.metadata.ownerReferences[0].kind' -A
This will list all pod owner types. I'm quite sure our database schema is too strict and missing a possible type.
Best regards, Eric
Sure, I also added a | sort -u
to make the list shorter:
$ kubectl get pods -o 'custom-columns=OWNER:.metadata.ownerReferences[0].kind' --no-headers -A | sort -u
Cluster
DaemonSet
InstanceManager
Job
Node
<none>
ReplicaSet
ShareManager
StatefulSet
Nice, thanks for the quick reply.
Cluster
, InstanceManager
and ShareManager
look like custom resource definitions to me. Can you confirm that? You may also run and share kubectl get crds
.
Anyway, you may fix that by executing the following statement in the Icinga for Kubernetes database:
ALTER TABLE pod_owner MODIFY kind varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL;
Cluster
,InstanceManager
andShareManager
look like custom resource definitions to me. Can you confirm that? You may also run and sharekubectl get crds
.
Exactly.
Cluster
is from https://cloudnative-pg.io/InstanceManager
and ShareManager
are from https://longhorn.io/And we definitely have more custom resources in other Kubernetes clusters. Currently I am only testing against our DEV cluster.
Here are all the CRDs we have here right now:
$ kubectl get crds
NAME CREATED AT
addons.k3s.cattle.io 2024-03-04T15:26:22Z
alertmanagerconfigs.monitoring.coreos.com 2024-03-08T10:08:41Z
alertmanagers.monitoring.coreos.com 2024-03-08T10:08:42Z
alerts.notification.toolkit.fluxcd.io 2024-03-06T10:24:09Z
apiservers.operator.tigera.io 2024-03-04T15:26:57Z
backingimagedatasources.longhorn.io 2024-03-08T10:25:13Z
backingimagemanagers.longhorn.io 2024-03-08T10:25:13Z
backingimages.longhorn.io 2024-03-08T10:25:13Z
backupbackingimages.longhorn.io 2024-03-20T11:53:50Z
backups.longhorn.io 2024-03-08T10:25:13Z
backups.postgresql.cnpg.io 2024-03-08T09:10:15Z
backuptargets.longhorn.io 2024-03-08T10:25:13Z
backupvolumes.longhorn.io 2024-03-08T10:25:13Z
bgpconfigurations.crd.projectcalico.org 2024-03-04T15:26:57Z
bgpfilters.crd.projectcalico.org 2024-03-07T14:04:57Z
bgppeers.crd.projectcalico.org 2024-03-04T15:26:57Z
blockaffinities.crd.projectcalico.org 2024-03-04T15:26:57Z
buckets.source.toolkit.fluxcd.io 2024-03-06T10:24:09Z
caliconodestatuses.crd.projectcalico.org 2024-03-04T15:26:57Z
certificaterequests.cert-manager.io 2024-03-08T09:10:21Z
certificates.cert-manager.io 2024-03-08T09:10:21Z
challenges.acme.cert-manager.io 2024-03-08T09:10:21Z
clusterinformations.crd.projectcalico.org 2024-03-04T15:26:57Z
clusterissuers.cert-manager.io 2024-03-08T09:10:21Z
clusters.postgresql.cnpg.io 2024-03-08T09:10:15Z
engineimages.longhorn.io 2024-03-08T10:25:13Z
engines.longhorn.io 2024-03-08T10:25:13Z
etcdsnapshotfiles.k3s.cattle.io 2024-03-07T14:02:08Z
felixconfigurations.crd.projectcalico.org 2024-03-04T15:26:57Z
gitrepositories.source.toolkit.fluxcd.io 2024-03-06T10:24:09Z
globalnetworkpolicies.crd.projectcalico.org 2024-03-04T15:26:57Z
globalnetworksets.crd.projectcalico.org 2024-03-04T15:26:57Z
helmchartconfigs.helm.cattle.io 2024-03-04T15:26:22Z
helmcharts.helm.cattle.io 2024-03-04T15:26:22Z
helmcharts.source.toolkit.fluxcd.io 2024-03-06T10:24:09Z
helmreleases.helm.toolkit.fluxcd.io 2024-03-06T10:24:09Z
helmrepositories.source.toolkit.fluxcd.io 2024-03-06T10:24:09Z
hostendpoints.crd.projectcalico.org 2024-03-04T15:26:57Z
imagepolicies.image.toolkit.fluxcd.io 2024-03-06T10:24:09Z
imagerepositories.image.toolkit.fluxcd.io 2024-03-06T10:24:09Z
imagesets.operator.tigera.io 2024-03-04T15:26:57Z
imageupdateautomations.image.toolkit.fluxcd.io 2024-03-06T10:24:09Z
installations.operator.tigera.io 2024-03-04T15:26:58Z
instancemanagers.longhorn.io 2024-03-08T10:25:13Z
ipamblocks.crd.projectcalico.org 2024-03-04T15:26:57Z
ipamconfigs.crd.projectcalico.org 2024-03-04T15:26:57Z
ipamhandles.crd.projectcalico.org 2024-03-04T15:26:57Z
ippools.crd.projectcalico.org 2024-03-04T15:26:57Z
ipreservations.crd.projectcalico.org 2024-03-04T15:26:57Z
issuers.cert-manager.io 2024-03-08T09:10:21Z
kubecontrollersconfigurations.crd.projectcalico.org 2024-03-04T15:26:57Z
kustomizations.kustomize.toolkit.fluxcd.io 2024-03-06T10:24:09Z
networkpolicies.crd.projectcalico.org 2024-03-04T15:26:57Z
networksets.crd.projectcalico.org 2024-03-04T15:26:57Z
nodes.longhorn.io 2024-03-08T10:25:13Z
nvadmissioncontrolsecurityrules.neuvector.com 2024-05-07T08:06:37Z
nvclustersecurityrules.neuvector.com 2024-05-07T08:06:37Z
nvcomplianceprofiles.neuvector.com 2024-05-07T08:06:37Z
nvdlpsecurityrules.neuvector.com 2024-05-07T08:06:37Z
nvsecurityrules.neuvector.com 2024-05-07T08:06:37Z
nvvulnerabilityprofiles.neuvector.com 2024-05-07T08:06:37Z
nvwafsecurityrules.neuvector.com 2024-05-07T08:06:37Z
ocirepositories.source.toolkit.fluxcd.io 2024-03-06T10:24:09Z
opensearchclusters.opensearch.opster.io 2024-03-08T09:10:14Z
opensearchroles.opensearch.opster.io 2024-03-08T09:10:14Z
opensearchuserrolebindings.opensearch.opster.io 2024-03-08T09:10:14Z
opensearchusers.opensearch.opster.io 2024-03-08T09:10:14Z
orders.acme.cert-manager.io 2024-03-08T09:10:21Z
orphans.longhorn.io 2024-03-08T10:25:13Z
podmonitors.monitoring.coreos.com 2024-03-08T10:08:42Z
poolers.postgresql.cnpg.io 2024-03-08T09:10:15Z
probes.monitoring.coreos.com 2024-03-08T10:08:42Z
prometheuses.monitoring.coreos.com 2024-03-08T10:08:42Z
prometheusrules.monitoring.coreos.com 2024-03-08T10:08:42Z
providers.notification.toolkit.fluxcd.io 2024-03-06T10:24:09Z
receivers.notification.toolkit.fluxcd.io 2024-03-06T10:24:09Z
recurringjobs.longhorn.io 2024-03-08T10:25:13Z
replicas.longhorn.io 2024-03-08T10:25:13Z
scheduledbackups.postgresql.cnpg.io 2024-03-08T09:10:15Z
servicemonitors.monitoring.coreos.com 2024-03-08T10:08:42Z
settings.longhorn.io 2024-03-08T10:25:13Z
sharemanagers.longhorn.io 2024-03-08T10:25:13Z
snapshots.longhorn.io 2024-03-08T10:25:13Z
supportbundles.longhorn.io 2024-03-08T10:25:13Z
systembackups.longhorn.io 2024-03-08T10:25:13Z
systemrestores.longhorn.io 2024-03-08T10:25:13Z
thanosrulers.monitoring.coreos.com 2024-03-08T10:08:43Z
tigerastatuses.operator.tigera.io 2024-03-04T15:26:57Z
volumeattachments.longhorn.io 2024-03-08T10:25:13Z
volumes.longhorn.io 2024-03-08T10:25:13Z
volumesnapshotclasses.snapshot.storage.k8s.io 2024-03-07T14:05:11Z
volumesnapshotcontents.snapshot.storage.k8s.io 2024-03-07T14:05:11Z
volumesnapshots.snapshot.storage.k8s.io 2024-03-07T14:05:11Z
And I will soon run the ALTER TABLE
command and write you back.
After the ALTER TABLE
statement the pod seems to work fine for a while.
But after a few minutes I got this now:
I0626 09:06:05.647119 1 database.go:285] "Connecting to database" logger="database"
I0626 09:06:07.883124 1 request.go:697] Waited for 1.126928059s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/api/v1/namespaces/cattle-monitoring-system/pods/pushprox-kube-controller-manager-client-qtg8v/log?container=pushprox-client
I0626 09:06:17.883325 1 request.go:697] Waited for 10.135795173s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/api/v1/namespaces/kube-system/pods/etcd-elefant-d-kubm02p/log?container=etcd
E0626 09:07:15.600518 1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*abi.Type)(0x18681e0), concrete:(*abi.Type)(0x1785860), asserted:(*abi.Type)(0x1a59d40), missingMethod:""} (interface conversion: interface {} is []uint8, not types.UUID)
goroutine 756 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x18cb900, 0xc005851b60})
/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1c35998?})
/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x18cb900?, 0xc005851b60?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/icinga/icinga-kubernetes/pkg/schema/v1.SyncContainers.func2()
/build/pkg/schema/v1/container.go:432 +0x796
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 68
/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96
panic: interface conversion: interface {} is []uint8, not types.UUID [recovered]
panic: interface conversion: interface {} is []uint8, not types.UUID
goroutine 756 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x1c35998?})
/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x18cb900?, 0xc005851b60?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/icinga/icinga-kubernetes/pkg/schema/v1.SyncContainers.func2()
/build/pkg/schema/v1/container.go:432 +0x796
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 68
/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:75 +0x96
It looks like a completely different problem though.
It looks like a completely different problem though.
Yes, I'm working on it. Thanks for testing!
It looks like a completely different problem though.
Yes, I'm working on it. Thanks for testing!
I pushed some fixes. Could you please pull the image and try again?
I finally got the time to work on this again. After deleting the already existing database and its persistent volume and restarting the icinga-kubernetes deployment it seems to be working. I can see this in the Pod log:
I0712 12:38:20.322918 1 database.go:285] "Connecting to database" logger="database"
I0712 12:38:20.328481 1 driver.go:43] "Can't connect to database. Retrying" logger="database" error="dial tcp 10.43.111.110:3306: connect: connection refused"
I0712 12:39:55.569544 1 driver.go:48] "Reconnected to database" logger="database"
I0712 12:39:55.572964 1 main.go:75] "Importing schema" logger="database"
I0712 12:40:03.681443 1 request.go:697] Waited for 1.005133945s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/api/v1/namespaces/cattle-monitoring-system/pods/pushprox-kube-controller-manager-client-qtg8v/log?container=pushprox-client
I0712 12:40:13.877462 1 request.go:697] Waited for 8.859478226s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/api/v1/namespaces/longhorn-system/pods/engine-image-ei-5cefaf2b-j57fs/log?container=engine-image-ei-5cefaf2b
I0712 12:40:23.877508 1 request.go:697] Waited for 10.741546478s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/api/v1/namespaces/cattle-monitoring-system/pods/pushprox-kube-proxy-client-rf565/log?container=pushprox-client
<...>
From time to time a new log line like the last few is appended to the log and that's it.
However in IcingaWeb I get this error when I try to use the Kubernetes module.
SQLSTATE[42S22]: Column not found: 1054 Unknown column 'node.id' in 'field list'
#0 /usr/share/icinga-php/ipl/vendor/ipl/sql/src/Connection.php(401): PDO->prepare() #1 /usr/share/icinga-php/ipl/vendor/ipl/sql/src/Connection.php(418): ipl\Sql\Connection->prepexec() #2 /usr/share/icinga-php/ipl/vendor/ipl/orm/src/Query.php(699): ipl\Sql\Connection->select() #3 /usr/share/icinga-php/ipl/vendor/ipl/orm/src/ResultSet.php(142): ipl\Orm\Query->yieldResults() #4 [internal function]: ipl\Orm\ResultSet->yieldTraversable() #5 /usr/share/icinga-php/ipl/vendor/ipl/orm/src/ResultSet.php(122): Generator->valid() #6 /usr/share/icinga-php/ipl/vendor/ipl/orm/src/ResultSet.php(114): ipl\Orm\ResultSet->advance() #7 /usr/share/icingaweb2/modules/kubernetes/library/Kubernetes/Common/BaseItemList.php(63): ipl\Orm\ResultSet->rewind() #8 /usr/share/icinga-php/ipl/vendor/ipl/html/src/HtmlDocument.php(344): Icinga\Module\Kubernetes\Common\BaseItemList->assemble() #9 /usr/share/icinga-php/ipl/vendor/ipl/html/src/HtmlDocument.php(566): ipl\Html\HtmlDocument->ensureAssembled() #10 /usr/share/icinga-php/ipl/vendor/ipl/html/src/HtmlDocument.php(390): ipl\Html\HtmlDocument->render() #11 /usr/share/icinga-php/ipl/vendor/ipl/html/src/BaseHtmlElement.php(297): ipl\Html\HtmlDocument->renderUnwrapped() #12 /usr/share/icinga-php/ipl/vendor/ipl/html/src/BaseHtmlElement.php(365): ipl\Html\BaseHtmlElement->renderContent() #13 /usr/share/icinga-php/ipl/vendor/ipl/html/src/HtmlDocument.php(568): ipl\Html\BaseHtmlElement->renderUnwrapped() #14 /usr/share/icinga-php/ipl/vendor/ipl/html/src/HtmlDocument.php(390): ipl\Html\HtmlDocument->render() #15 /usr/share/icinga-php/ipl/vendor/ipl/html/src/HtmlDocument.php(568): ipl\Html\HtmlDocument->renderUnwrapped() #16 /usr/share/icinga-php/ipl/vendor/ipl/web/src/Compat/ViewRenderer.php(56): ipl\Html\HtmlDocument->render() #17 /usr/share/icinga-php/vendor/vendor/shardj/zf1-future/library/Zend/Controller/Action/Helper/ViewRenderer.php(970): ipl\Web\Compat\ViewRenderer->render() #18 /usr/share/icinga-php/vendor/vendor/shardj/zf1-future/library/Zend/Controller/Action/HelperBroker.php(277): Zend_Controller_Action_Helper_ViewRenderer->postDispatch() #19 /usr/share/icinga-php/vendor/vendor/shardj/zf1-future/library/Zend/Controller/Action.php(527): Zend_Controller_Action_HelperBroker->notifyPostDispatch() #20 /usr/share/icingaweb2/library/Icinga/Web/Controller/Dispatcher.php(76): Zend_Controller_Action->dispatch() #21 /usr/share/icinga-php/vendor/vendor/shardj/zf1-future/library/Zend/Controller/Front.php(954): Icinga\Web\Controller\Dispatcher->dispatch() #22 /usr/share/icingaweb2/library/Icinga/Application/Web.php(294): Zend_Controller_Front->dispatch() #23 /usr/share/icingaweb2/library/Icinga/Application/webrouter.php(105): Icinga\Application\Web->dispatch() #24 /usr/share/icingaweb2/public/index.php(4): require_once(String) #25 {main}
I don't know if it related to the Helm Chart or if I did something wrong.
Affected Chart
icinga-stack
Which version of the app contains the bug?
0.3.0
Please describe your problem
Actually I am using my fork here: https://github.com/open-i-gmbh/icinga-helm-charts
But at the moment there are only minor changes locally on my machine because I am trying to get HA working and parent zones and satellites and all that good stuff.
Anyway. The bug I encountered comes from the
icinga-kubernetes
Subchart. It deploys fine but the Pod is not getting healthy. This is what the Pod shows:And it also shows this a lot of times:
And the database pod shows this:
On the other hand it seemed to be able to create all the necessary tables in the database and even in
pod_owner
there are a lot of entries. But the Pod still does not get healthy and restarts itself the whole time.This is my
values.yaml
:Just ignore the config for
icinga2
because I changed a lot there.