apache / solr-operator

Official Kubernetes operator for Apache Solr
https://solr.apache.org/operator
Apache License 2.0
243 stars 112 forks source link

Ability to mount an existing pvc to solr cloud pods for persistence. #663

Open barlass opened 7 months ago

barlass commented 7 months ago

I have a PVC created bound to a PV which is a Azure File Share.

While setting up solr cloud with persistence, I want to refer to this pvc. And updated the value as per documentation. This however seems to create different pvcs for each pod instead of using given one.

Version: 0.8 Aks 1.27

Anyone faced this or this is expected?

yauhen-vastraknutau-epam commented 6 months ago

Yep, observing the same behaviour. Double checking myself at this moment to be sure that this is not some typo from me.

thai-op commented 3 months ago

This is the current behavior of SolrCloud. The persistent PVC template provided in https://github.com/apache/solr-operator/blob/8e986e60a82fcd577fe55e882e23a5491f4d3014/api/v1beta1/solrcloud_types.go#L281 will be turned into a new PVC claim. Notice that's its a PVC template and not a PVC claim.

I'm thinking of adding this support in my own branch because I need this to use shared scalable file storage in aws.

HoustonPutman commented 3 months ago

Solr needs different directories for each pod though, so I'm not exactly sure what a shared PVC would do for you (other than to be used for backups).

That PVC can be mounted to Solr through customSolrKubeOptions.podOptions.extraVolumes, but it will not be used to store Solr data, because each Solr pod needs to be able to store data independently.

thai-op commented 3 months ago

Solr needs different directories for each pod though, so I'm not exactly sure what a shared PVC would do for you (other than to be used for backups).

From the perspective of the shared fs, the mounted dir will be different for each pod via the usage of subPathExpr: $(POD_NAME). But from the perspective of each Solr pod, it's the same dir. For example:

volumes:
  - name: shared-data
    source:
      persistentVolumeClaim:
        claimName: shared-fs-pvc
    defaultContainerMount:
      name: shared-data
      mountPath: "/solr-data"
      subPathExpr: $(POD_NAME)

Then we can configure Solr home to be under /solr-data which is mounted to /$(POD_NAME) under the shared FS file system. There are several benefits to this deploy:

This is just a starting idea here to separate compute from storage for Solr. Several cluster could be bootstrapped and mounted with a read-only PVC to inherit all the existing data without any copy & isolate heavy write workload from read traffic. It also allows the storage to scale horizontally without adding new nodes (which requires a rebalance of shards) just to name a few.

Anyhow, I've prototyped a working version in my own branched that is tested to be working internally: https://github.com/thai-op/solr-operator/commit/80addcd4d57ee2bb9b0a961497f443b399608ad4. If it's possible, do you might help me with shepherding / pr-reviewing it so we can have this feature in a future Solr operator version?