freedev / solrcloud-zookeeper-kubernetes

Run Solrcloud and Zookeeper in a Kubernetes environment
Apache License 2.0
57 stars 29 forks source link

Backup/Restore Sorlcloud #12

Open nahshal opened 3 years ago

nahshal commented 3 years ago

Hallo Vincenzo,

Thank you for this Example, it helps me a lot. But still I have a question. After deploying my Solrcloud with Zookeeper using EBS dynamic volumes, now I need to make sure that my data is secure, off course one can argue that Volumes in AWS are secured and when the Kubernetes cluster down you still can deploy it again and then migrate from old Volumes in AWS to the new generated ones. But if I need more security then I would also backup the solr Data to e.g NFS in a private Server, to do that I can use the Collection API of Solr and make Backup for my Collection and then later Restore them when needed. My Problem is, when I go to one of the node and do the Backup into the mountPath (Shared File System), then the backup is only created in that node and it is not automatically replicated in the other nodes, so when I do Restore now, it does not work as the Backup is only in one node, so I need to copy the backup manually to all nodes and then Restore will work. My expectation that the Backup in one node will be enough and will be available to all nodes.

Have you already faced such Problem, or do you know another way of Backup and Restore for Solrcloud?

Best Yahya

freedev commented 3 years ago

Hi @nahshal , to have a clear understanding of your problem would be easier for me if you add a minimal reproducible example of what you're experiencing. In other words, it would be better if you provide also the code so I can easily understand and use it to reproduce the problem.

nahshal commented 3 years ago

Hi @freedev, Thank you for your quick reply. No Problem, you can easily reproduce the Problem by applying your example. follwoing your example you would habe 2 replicas of Solr, let us asume that you have done it in AWS EKS, then you would habe the following.

` default pod/solr-cluster-sts-0 1/1 Running 0 20h

default pod/solr-cluster-sts-1 1/1 Running 0 20h

default pod/zookeeper-sts-0 1/1 Running 0 20h

default pod/zookeeper-sts-1 1/1 Running 0 20h ` now, you have your solrs and zookeeper, and also you have added some Documents to your Solr. Required now that you make a Backup of your Solr, what you would do using e.g Solr Collection API is the following:

  1. SSH to one of your nodes and then apply the API: curl -i 'http://localhost:8983/solr/admin/collections?action=BACKUP&name=<BackupName>&collection=<CollectionName>&location=/path/to/shared/file-system' this curl will create a Backup of your Collection in the provided path which was actually used in the yamls file as (volumeMounts) , right?

` volumeMounts:

  1. Now to restore the backup later, you would easily use the API again like following:

curl -i 'http://localhost:8983/solr/admin/collections?action=RESTORE&name=<BackupName>&collection=<CollectionName>&location=/path/to/shared/file-system' I would expect that this API will restore the Backup to all the nodes, but what happens is that the Backup command produced the backup only in the node that was run into, no copy was generated in the other nodes, so the RESTORE API did not work, I needed to make the copy manullay to the other node in the same path and then it works

Himanshusoni9 commented 11 months ago

One More issue : On Solr Collection Data Backup Based on Condition/Data Filter.(There is no provision for that . )

Because SOLR BACKUP API with Query is not working //http://localhost:8983/solr/admin/collections?action=RESTORE&name=myBackupName&location=C:\Users\DELL\Downloads\SOLR_BACKUP&collection=myCondCollection&query=text:cellphone

I am trying out to perform backups of our Solr data with a particular condition in mind.

To provide some context, let's say Solr collection consists of 100 records, among which 70 records contain the text "mobile," and the remaining 30 records contain the text "cellphone." my objective is to take a Solr collection/data backup that contains only the records the text "cellphone" – essentially, we want to create a backup file that reflects these 30 specific records only.

I would greatly appreciate it if you could share insights on the best practices or methods to achieve this selective backup based on a condition. If there are specific parameters or commands we should be utilizing, kindly provide the necessary guidance. Additionally, any documentation or references you could point us to would be immensely helpful.

Thank you in advance for your time and assistance. We value your expertise and look forward to implementing an efficient solution based on your recommendations.