apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
246 stars 111 forks source link

[Backup/Restore] - How to increase backup/restore throughput/speed? #595

Closed anilkhichar closed 1 year ago

anilkhichar commented 1 year ago

We have benchmarked 2GiB/minute throughput during SolrBackup with ~4TB RAW index data. Backup/Restore duration is critical during disaster event hence, we are looking for increased speed.

  - backupName: solr-backup-XYZ
    collection: <collection>
    finishTimestamp: "2023-07-28T00:16:42Z"
    finished: true
    startTimestamp: "2023-07-27T23:41:36Z"
    successful: true

Right now, only performance option we found is the endpoint URL to keep the traffic private between VPC & S3. But it's not scaling the throughput significantly. S3 do support throughput up-to 100Gbps and our selected EC2 instance do support 10Gbps.

Can we configure S3 multipart upload to speed up the backup and similarly how to boost-up the RESTORE speed?

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/examples-s3-objects.html#list-objects https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html#optimizing-performance-parallelization

May be we expose one more option in the backup configuration that can support horizontal scaling and parallel RW.

@HoustonPutman: Any plan/suggestion on this optimization?

HoustonPutman commented 1 year ago

Sorry, meant to comment as well.

The Solr Operator just uses the built-in Solr S3-Repository Module, so there's nothing that the Solr Operator can do to speed this up.

Instead would you mind posting this same information as a new JIRA Issue at https://issues.apache.org/jira/projects/SOLR/summary? That way optimizations can be tracked and discussed in the right place.