cloudfoundry-attic / bosh-notes

Collection of proposals for BOSH
Apache License 2.0
51 stars 23 forks source link

Availability Zone support #1

Closed cppforlife closed 8 years ago

cppforlife commented 9 years ago

Initial proposal: availability-zones.md

onsi commented 9 years ago

Looks pretty good to me after a quick glance. Presumably the answer is yes but it wasn't clear in the proposal: we'll be able to get a reference to the AZ in our erb templates correct?

cppforlife commented 9 years ago

@onsi updated with availability_zone field.

onsi commented 9 years ago

another thing @cppforlife - its important, from an operator's perspective for something like bosh vms to clearly show me which VMs are in which zone. I imagine that'll fall out of this but I wanted to make the requirement explicit.

allomov commented 9 years ago

This is a very interesting and helpful initiative. Still there are few moments that I can't see clearly:

  1. How job instances are destributed between AZs? Is it possible to configure number of instances for each zone?
  2. There is an example in the note:
jobs:
- name: etcd
  instances: 2
  ...
  migrated_jobs:
    z1: etcd_z1
    z2: etcd_z2
  ...
  networks:
  - name: my-net

Can't get why there is only one network in this example (as I see network is zone specific property). Do you mean there will be any defaults for this?

cppforlife commented 8 years ago

@allomov instances would be distributed as evenly as possible e.g. 3 instances in 2 azs means: 2 in the first az and 1 in the second az.

for persistent jobs it's a bit different during scale up azs scenario. for example if there were 3 instances in 1 az and you've decided to add another az, we will just keep 3 instances in the first az. if you decided to scale up number of instances to 4 let's say, we will place a new instance in the second az. eventually we may create a command that forcefully rebalances the instances for persistent jobs -- bosh rebalance my-job?

regarding networking: networks will be able to span az by having multiple subnets associated. each subnet belongs to one or more azs. when an instance is created appropriate subnet will be selected based on instance's az selection. see my-net definition in https://github.com/cloudfoundry/bosh-notes/blob/master/availability-zones.md.

allomov commented 8 years ago

I see. Thank you for answer.

I think it could be better to have an option to select number of instances for each AZ in manifest file. For instance you may want to have fewer jobs in certain AZ just to be protected from main AZ outage. Still this case is connected with Resurector behaviour that is in TBD section.

cppforlife commented 8 years ago

My thinking behind not specifically allowing per AZ instance count breakdown is so that we can potentially build up some higher level features e.g. balancing/weights based on percentages, or marking that one AZ is temporarily down so BOSH can increase number of instances in another.

allomov commented 8 years ago

@cppforlife could you share your thoughts on when we can expect feature with increasing number of instances within healthy AZ in case of another AZ fails?

cppforlife commented 8 years ago

@allomov i am hoping we can get to it as more releases (including cf-release) converts to using these new features.

dpb587-pivotal commented 8 years ago

Closing - availability zone features have been finished since v241 and are now publicly documented on bosh.io.