apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
243 stars 111 forks source link

operator crash on gcs backup start and then never ends #564

Closed MikeMichel closed 1 year ago

MikeMichel commented 1 year ago

helm:

        backupRepositories:
        - name: "gcs-backup"
          gcs:
            bucket: "roller-backups"
            gcsCredentialSecret: 
              name: "gcs-sa"
              key: "service-account-key.json"
            baseLocation: "solr"

backup.yaml

apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
  name: solr-backup-dev2
  namespace: solr
spec:
  repositoryName: "gcs-backup"
  solrCloud: apache-solr
  collections:
    - master_roller_Product_default
2023-05-10T17:42:34.099Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to start collection backup  {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:34.501Z    INFO    controller-runtime.manager.controller.solrbackup    Updating status for solr-backup {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "newStatus": {"solrVersion":"8.11.1-release-5-0-9-0","startTimestamp":"2023-05-10T17:42:34Z","collectionBackupStatuses":[{"collection":"master_roller_Product_default","backupName":"solr-backup-dev2-master_roller_Product_default","inProgress":true,"startTimestamp":"2023-05-10T17:42:34Z"}]}, "oldStatus": {"startTimestamp":null}}
2023-05-10T17:42:34.513Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:34.516Z    INFO    controller-runtime.manager.controller.solrbackup    Updating status for solr-backup {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "newStatus": {"solrVersion":"8.11.1-release-5-0-9-0","startTimestamp":"2023-05-10T17:42:34Z","collectionBackupStatuses":[{"collection":"master_roller_Product_default","backupName":"solr-backup-dev2-master_roller_Product_default","inProgress":true,"startTimestamp":"2023-05-10T17:42:34Z","asyncBackupStatus":"running"}]}, "oldStatus": {"solrVersion":"8.11.1-release-5-0-9-0","startTimestamp":"2023-05-10T17:42:34Z","collectionBackupStatuses":[{"collection":"master_roller_Product_default","backupName":"solr-backup-dev2-master_roller_Product_default","inProgress":true,"startTimestamp":"2023-05-10T17:42:34Z"}]}}
2023-05-10T17:42:34.524Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:39.514Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:44.516Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:49.520Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:54.523Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:59.526Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to check on collection backup   {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
2023-05-10T17:42:59.529Z    INFO    controller-runtime.manager.controller.solrbackup    Calling to delete async info for backup command.    {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}
E0510 17:42:59.535938       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 345 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x13dd140, 0x2249580})
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/runtime/runtime.go:74 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003dea80})
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/runtime/runtime.go:48 +0x75
panic({0x13dd140, 0x2249580})
    /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/apache/solr-operator/controllers.(*SolrBackupReconciler).Reconcile(0xc000537460, {0x176d458, 0xc0002ed3b0}, {{{0xc0013c350c, 0x144a920}, {0xc0013c34f0, 0xc000833380}}})
    /workspace/controllers/solrbackup_controller.go:150 +0x8ca
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005a5ea0, {0x176d3b0, 0xc00052a740}, {0x1427120, 0xc0003dea80})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298 +0x303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005a5ea0, {0x176d3b0, 0xc00052a740})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2({0x176d3b0, 0xc00052a740})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216 +0x46
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f4a210f8d30)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1, {0x1746200, 0xc0007da0c0}, 0x1, 0xc0003b2000)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00091c120, 0x3b9aca00, 0x0, 0xe8, 0x0)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x176d3b0, 0xc00052a740}, 0xc000626050, 0xc000250fa0, 0xf18001, 0x0)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185 +0x99
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x176d3b0, 0xc00052a740}, 0xc0003b2000, 0x0)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99 +0x2b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:213 +0x356
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x12c406a]

goroutine 345 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0003dea80})
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x13dd140, 0x2249580})
    /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/apache/solr-operator/controllers.(*SolrBackupReconciler).Reconcile(0xc000537460, {0x176d458, 0xc0002ed3b0}, {{{0xc0013c350c, 0x144a920}, {0xc0013c34f0, 0xc000833380}}})
    /workspace/controllers/solrbackup_controller.go:150 +0x8ca
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0005a5ea0, {0x176d3b0, 0xc00052a740}, {0x1427120, 0xc0003dea80})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298 +0x303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0005a5ea0, {0x176d3b0, 0xc00052a740})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2({0x176d3b0, 0xc00052a740})
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216 +0x46
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185 +0x25
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f4a210f8d30)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1, {0x1746200, 0xc0007da0c0}, 0x1, 0xc0003b2000)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00091c120, 0x3b9aca00, 0x0, 0xe8, 0x0)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x176d3b0, 0xc00052a740}, 0xc000626050, 0xc000250fa0, 0xf18001, 0x0)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185 +0x99
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x176d3b0, 0xc00052a740}, 0xc0003b2000, 0x0)
    /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99 +0x2b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:213 +0x356

Bucket path is roller-backups/solr

The backups then starts anyway and writes data to the bucket but the backup job never ends while operator logs 2023-05-10T17:52:25.032Z INFO controller-runtime.manager.controller.solrbackup Calling to check on collection backup {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "solrCloud": "apache-solr", "collection": "master_roller_Product_default"}

When I start a backup without defining collections operator logs 2023-05-10T18:02:10.289Z INFO controller-runtime.manager.controller.solrbackup Updating status for solr-backup {"reconciler group": "solr.apache.org", "reconciler kind": "SolrBackup", "name": "solr-backup-dev2", "namespace": "solr", "newStatus": {"solrVersion":"8.11.1-release-5-0-9-0","startTimestamp":"2023-05-10T18:02:10Z"}, "oldStatus": {"startTimestamp":null}}

is not writing any data to the bucket and also runs forever.

Running on GKE

Operator 0.6.0 Solr 8.11.1

HoustonPutman commented 1 year ago

Thanks for reporting this @MikeMichel, can you try running the v0.7.0 version of the solr operator and see if that fixes your problem?

I've looked at the code and this should be fixed in v0.7.0.

MikeMichel commented 1 year ago

i can confirm in 0.7.0 it works. thx @HoustonPutman