cloudfoundry / bosh-vsphere-cpi-release

BOSH vSphere CPI
Apache License 2.0
32 stars 36 forks source link

Add opt-in support for per bosh deployment DRS rules #378

Open gberche-orange opened 5 months ago

gberche-orange commented 5 months ago

Feature Request

Detailed Description

Given dynamically created bosh deployments (e.g. mysql-cluster-1, mysql-cluster-2, ... mysql-cluster-N) with an instance group "mysql" with 3 instances, In order for DRS to avoid scheduling mysql instances of a given deployment on the same vsphere esx I need that vsphere bosh cpi supports a DRS rule per deployment

Currently, either

https://github.com/cloudfoundry/docs-bosh/blob/96fdb6fff79d7eed1f78b6fb05ce064de2acfea0/content/vsphere-cpi.md?plain=1#L15-L17

        * **drs_rules** [Array, optional]: Array of DRS rules applied to [constrain VM placement](vm-anti-affinity.md#vsphere). Must have only one.
            * **name** [String, required]: Name of a DRS rule that the Director will create.
            * **type** [String, required]: Type of a DRS rule. Currently only `separate_vms` is supported.
- type: replace
  path: /vm_extensions?/-
  value:
    name: drs-antiaffinity-r4
    cloud_properties:
      datacenters:
      - name: ((/secrets/vsphere_4_vcenter_dc))
        clusters:
        #r4-z1 cluster              
        - ((/secrets/vsphere_4_1_vcenter_cluster)):
            drs_rules:
            - name: ((/secrets/site_type))-bosh-coab-drs-antiaffinity
              type: separate_vms      

https://github.com/cloudfoundry/bosh-vsphere-cpi-release/blob/da8f3fc281c8e8aecb972d7182f798a4f673c184/jobs/vsphere_cpi/spec#L26-L28

https://github.com/cloudfoundry/bosh-vsphere-cpi-release/blob/bf35f007bd40a42c2ca9474b673f780dab39b8f8/src/vsphere_cpi/lib/cloud/vsphere/vm_creator.rb#L364-L376

https://github.com/cloudfoundry/bosh-vsphere-cpi-release/blob/bf35f007bd40a42c2ca9474b673f780dab39b8f8/src/vsphere_cpi/lib/cloud/vsphere/vm_config.rb#L145-L151

Given that env.bosh.group is systematically defined by bosh director in https://github.com/cloudfoundry/bosh/blob/dec31de320fcd29a574db8685f6abf697138f788/src/bosh-director/lib/bosh/director/deployment_plan/steps/create_vm_step.rb#L135 This results into DRS rules being created for each instance group of each deployment. The DRS rules are named from template: <bosh-director-name>-<bosh-deployment-name>-<instance-group-name>

This results into a large number of auto-created DRS rules for bosh directors with a existing large number of deployments

While theoretically there is no limit to number of DRS rules, it seems not recommended to enable this property on a bosh director with a large number of deployments (unless every single instance group in all deployments require an anti-affinity DRS rule ) https://communities.vmware.com/t5/VMware-vCenter-Discussions/Maximum-Number-of-DRS-Rules-per-Cluster/td-p/2744546

It is recommend to use DRS rules sparingly, hence it is better not to use them unless it is absolutely required. As the number of rules gets increased, it will restrict DRS opportunities of balancing the cluster. It is operationally challenging in managing them as well.

Context

Why is this change important to you? How would you use it?

In order to benefit from vsphere HA support from distinct esx instances, I need DRS anti affinity on relevant instance groups of selected deployments. This is important for many dynamic bosh deployments which can not leverage static DRS rules declared in the cloud-config.

Alternative Implementations

VM Types / VM Extensions support for enable_auto_anti_affinity_drs_rules

In addition to supporting the enable_auto_anti_affinity_drs_rules=true at the global level, this property would also be supported in a vm_types or vm_extensions block, overriding the global value.

Inspiration from similar property upgrade_hw_version

https://github.com/cloudfoundry/bosh-vsphere-cpi-release/blob/bf35f007bd40a42c2ca9474b673f780dab39b8f8/src/vsphere_cpi/lib/cloud/vsphere/vm_creator.rb#L240-L242

https://github.com/cloudfoundry/bosh-vsphere-cpi-release/blob/bf35f007bd40a42c2ca9474b673f780dab39b8f8/src/vsphere_cpi/lib/cloud/vsphere/vm_config.rb#L13-L15

https://github.com/orange-cloudfoundry/bosh-vsphere-cpi-release/blob/87b8474f18046e6920d4c44478138f084cb3cdf3/src/vsphere_cpi/spec/unit/cloud/vsphere/vm_config_spec.rb#L24-L50

New cpi property

EDIT: likely too complex proposal Add new cpi flag `vcenter.restrict_auto_anti_affinity_drs_rules_to_marked_instance_groups` which adds new opt-in behavior without introducing breaking changes to existing behavior ``` vcenter.enable_auto_anti_affinity_drs_rules: description: Creates a DRS rules for each instance group to place VMs on separate hosts. Conditional to the deployment manifest to set a non-nil `env.bosh.group` field in an instance group. The DRS rules are named from template: -- default: false vcenter.restrict_auto_anti_affinity_drs_rules_to_marked_instance_groups: description: When `enable_auto_anti_affinity_drs_rules=true`, restrict auto generated DRS rules to instance groups declaring `env.bosh.enable_auto_anti_affinity_drs_rules=true` in the deployment manifest default: false ```

Complexity

gberche-orange commented 5 months ago

@selzoc would you accept a PR implementing this proposal ?

selzoc commented 5 months ago

@selzoc would you accept a PR implementing this proposal ?

Well, it's not up to me! But I see this issue is in the Waiting for Changes | Open for Contribution part of the working group project, so we'd probably review it.

gberche-orange commented 3 months ago

@cunnie would you by chance have historical background to review and comment this updated proposal, in particular the VM Types / VM Extensions support for enable_auto_anti_affinity_drs_rules section above ?