Closed anilkhichar closed 1 year ago
Sorry, meant to comment as well.
The Solr Operator just uses the built-in Solr S3-Repository Module, so there's nothing that the Solr Operator can do to speed this up.
Instead would you mind posting this same information as a new JIRA Issue at https://issues.apache.org/jira/projects/SOLR/summary? That way optimizations can be tracked and discussed in the right place.
We have benchmarked 2GiB/minute throughput during SolrBackup with ~4TB RAW index data. Backup/Restore duration is critical during disaster event hence, we are looking for increased speed.
Right now, only performance option we found is the
endpoint
URL to keep the traffic private between VPC & S3. But it's not scaling the throughput significantly. S3 do support throughput up-to 100Gbps and our selected EC2 instance do support 10Gbps.Can we configure S3 multipart upload to speed up the backup and similarly how to boost-up the RESTORE speed?
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/examples-s3-objects.html#list-objects https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html#optimizing-performance-parallelization
May be we expose one more option in the backup configuration that can support horizontal scaling and parallel RW.
@HoustonPutman: Any plan/suggestion on this optimization?