Open pbackup12345 opened 1 year ago
Thanks for providing so much information!
Does this happen after a few backups complete, or does it happen from the first backup?
It happens after first installation, when I created a collection and added a very few records. And it happens repeatedly. I.e. I started from scratch several times and bumped into the same problem with a default autoPilot GKE cluster.
So at this line, it should have finished almost all of the backup logic and the files should exist in s3/gcs. The line after that will delete any old incremental backups if necessary (which it shouldn't be, as its the first backup...).
2023-04-08 14:42:16.585 INFO (OverseerThreadFactory-29-thread-5-processing-n:explore-solrcloud-1.explore-solrcloud-headless.sop030:8983_solr) [c:dsearch ] o.a.s.c.a.c.BackupCmd Completed backing up ZK data for backupName=local-backup14-dsearch
Also before that error can happen, it should save the state of the completed task. So whenever the requeststatus request is sent, it should have information to populate the result.
From there the log only has endless calls to check the requeststatus with no results:
Can you show me what this means? What does the "no results" json look like?
Have you tried running an async backup yourself?
Hello all. I am observing exactly the same symptoms described above.
Moreover, I see something that might be of interest for the investigation.
mybucket
. After trying to run a backup, the files are created in the bucket as described by the OP. mycloud-solr
mycloud-backup
, and I configured it with location: mycloud-solr
(the location matches with the SolrCloud name, this is just a preference). So it's defined like:
apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
name: mycloud-backup
namespace: search
spec:
repositoryName: "gcs-backups"
solrCloud: mycloud-solr
location: "mycloud-solr"
collections:
- articles
articles
mycloud-solr/mycloud-backup-articles/articles
(so the path seems to be {location}/{solrBackupName}-{collection}/{collection}
http://localhost:8983/solr/admin/collections?action=LISTBACKUP&name=mycloud-backup-articles&location=mycloud-solr&repository=gcs-backups&gcsBucket=mybucket
, I do get the backups I created listed there, even if the SolrBackup CRD still reports itself as "in progress".
name=mycloud-backup-articles
corresponds to the "directory" directly under the location
{
"responseHeader": {
"status": 0,
"QTime": 347
},
"collection": "articles",
"backups": [
{
"indexFileCount": 122,
"indexSizeMB": 0.748,
"shardBackupIds": {
"shard1": "md_shard1_0.json"
},
"collection.configName": "articles",
"backupId": 0,
"collectionAlias": "articles",
"startTime": "2023-05-11T10:22:25.339652045Z",
"indexVersion": "9.4.2",
"endTime": "2023-05-11T10:22:42.946120826Z"
},
{
"indexFileCount": 122,
"indexSizeMB": 0.748,
"shardBackupIds": {
"shard1": "md_shard1_1.json"
},
"collection.configName": "articles",
"backupId": 1,
"collectionAlias": "articles",
"startTime": "2023-05-11T10:35:46.024110473Z",
"indexVersion": "9.4.2",
"endTime": "2023-05-11T10:35:50.499108324Z"
}
]
}
I have a GKE autoPilot pod with a relatively vanilla setup with tlsTermination at Ingress and 3 pods. Everything works except for the backup feature.
I use Solr-operator 0.6.0, solr: 8.11.0 Zookeeper: 0.2.14
Following are the relevant parts of my setup:
Main yaml on S3 location:
My backup yaml:
The backup actually starts and both location (S3 and GCS) receives files as well, but after a while the backup process stops. There are no solr error messages, but this is the relevant portion of the logs of the pod which does the backup:
And this is where it seems to die:
From there the log only has endless calls to check the requeststatus with no results:
Additionally through normal api calls the backup and restore function works perfectly. Specifically the following runs without a hitch:
And my main yaml in full: