cloudfoundry / bosh-google-cpi-release

BOSH Google CPI
Apache License 2.0
63 stars 96 forks source link

Specifying Local SSD for disk #152

Closed oppegard closed 3 years ago

oppegard commented 7 years ago

Is is possible to use Local SSD for the ephemeral disk mounted at /var/vcap/data? I could only find references to pd-ssd in the examples, which I assume is shorthand for persistent SSD:

disk_pools:
  - name: disks
    disk_size: 32_768
    cloud_properties:
      type: pd-ssd

Our present Concourse CI workers are provisioned as AWS C3 instance types, which provide local SSD and drastically better performance for our workloads. We'd like to achieve the same on GCP if possible.

evandbrown commented 7 years ago

Hi @oppegard,

Local SSD isn't currently supported, but I think I'm pretty familiar with what it would take to add it. We can't define local SSDs in disk_pools because they can't be created independently of a VM. I believe we'd want to add it to the cloud_properties of a VM. That would look something like:

resource_pools:
  - name: common
    network: private
    stemcell:
      name: bosh-google-kvm-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      zone: us-east-1
      machine_type: n1-standard-1
      root_disk_size_gb: 20
      root_disk_type: pd-standard
      local_ssd:
        count: 4
        interface: nvme

The only thing configurable about local SSD is the # you want to attach (the maximum varies for each instance type), as well as the interface (nvme or scsi).

Getting this in the CPI is easy stuff, and I only see two potential issues:

  1. Will the bosh_agent discover, format, and mount these disks correctly? If not, that's an upstream patch.
  2. Is the NVME driver in the stemcell? I don't think it is, but SCSI should just work. I'll look into that more.

@cppforlife this look OK to you?

oppegard commented 7 years ago

@evandbrown that makes sense regarding local SSD being part of the VM definition. I think we'll need support for this before we can migrate our CI to GCP. Our C3 workers often spike up to 150K write IOPS -- I'm looking forward to faster CI builds on a GCP instance with NVMe and those 360K write IOPS. 😁

craigfurman commented 6 years ago

I think it would make sense for the bosh agent to mount the (partitioned) local SSD at /var/vcap/data. I reckon formatting with ext4 as a first pass, unless there is a need for specifying the fs type in cloud_properties.

@evandbrown would you be interested in a PR that added this feature? There may well be an associated PR for the bosh-agent, which I would submit before posting this one.

johnsonj commented 6 years ago

@craigfurman yes! that would be awesome

johnsonj commented 6 years ago

@craigfurman - any update on the upstream changes? happy to work with you to get this into the CPI

cppforlife commented 6 years ago

wouldnt it prevent live migration? i could see it as an option but not necessarily a default. what was the use case (which app) for exposing it?

On Tue, Jan 23, 2018 at 6:45 PM, Jeff Johnson notifications@github.com wrote:

@craigfurman https://github.com/craigfurman - any update on the upstream changes? happy to work with you to get this into the CPI

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cloudfoundry-incubator/bosh-google-cpi-release/issues/152#issuecomment-360003036, or mute the thread https://github.com/notifications/unsubscribe-auth/AALV97cuZTE3gU9kQhTdTOsjazloEAU3ks5tNplWgaJpZM4MA8pf .

johnsonj commented 6 years ago

Data is retained on live migration (src)- but I'd bet perf goes down for a bit.

IMO: The use case is for applications that need to write out data that's not very important. So a CI doing builds, a cache server, or something along those lines. I'd be apprehensive about saying 'swap this flag and all of your bosh apps are writing to a local SSD' without further investigation.

craigfurman commented 6 years ago

@johnsonj I started working on it briefly, but dropped it for a while after some other stuff came up. I'm not likely to get back to it any time soon.

I'll post some limited context here though, in case I or someone else picks this up:

Proposed interface: on the cloud_properties of a VM:

local_ssds:
  interface: <nvme or scsi>
  count: <count>

At the time of writing, the SSDs have a fixed size of 375GB. If more than one is desired, we're in uncharted territory when it comes to bosh UX (as far as I know). Normally, bosh formats disks with ext4 and mounts them at /var/vcap/store or /var/vcap/data (although XFS is now available as for persistent disks).

I can think of 3 options:

  1. Limit the count to 1, format it, and mount it as an ephemeral disk at /var/vcap/data.
  2. Same as above, but without count limiting. Extra disks are not formatted or mounted.
  3. Set the disks up in a RAID configuration using mdadm, or possibly lvm. Format and mount the raid array at /var/vcap/data.

There may be other options. What do you think?

johnsonj commented 6 years ago

I'd start with option 1 and see if there's appetite for option 3. Is it fair to say /var/vcap/data always lives on a local disk and is always tied to the instance vs. /var/vcap/store is always a persistent disk that can be re-attached elsewhere? I want to make sure the semantics of local SSD match what a BOSH release author would expect.

d commented 6 years ago

Hi our team kinda desperately needs this too (either that or we're considering moving back to AWS). Option 1 mentioned by @craigfurman sounds like the path of least resistance: the Agent code is probably mostly in place, and we "just" need to change the CPI code I guess?

Is this CPI mostly maintained by Googlers?

evandbrown commented 6 years ago

Hi Jesse,

I’ll look into this in the next few days and reply with a proposal and time frame. More to come...

Thanks,

Evan

On Mon, May 21, 2018 at 3:30 PM Jesse Zhang notifications@github.com wrote:

Hi our team kinda desperately needs this too (either that or we're considering moving back to AWS). Option 1 mentioned by @craigfurman https://github.com/craigfurman sounds like the path of least resistance: the Agent code is probably mostly in place, and we "just" need to change the CPI code I guess?

Is this CPI mostly maintained by Googlers?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/cloudfoundry-incubator/bosh-google-cpi-release/issues/152#issuecomment-390802440, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoGLQWRn9-bMRCf9GOhVKxG4hWzxx13ks5t0z_2gaJpZM4MA8pf .

evandbrown commented 6 years ago

@d, it sounds like you've got experience using ephemeral disks with EC2 so I'm curious if you've got an opinion wrt how this would be surfaced in the Google CPI?

There were a few ideas floated earlier in the thread. What if we allow them to be specified as @craigfurman suggested:

  cloud_properties:
    local_ssd:
      interface: (nvme|scsi)
      count: (1-8)

then let BOSH handle them the same way the raw_instance_storage property of the AWS CPI does, i.e. "With multiple disks attached, the Agent partitions and labels instance storage disks with label raw-ephemeral-* so that release jobs can easily find and use them"

Appreciate your input on how this can be implemented in a way that's useful for you.

cppforlife commented 6 years ago

@evandbrown if the goal is to use ssh for ephemeral data, then i think we want to have this a bit more transparent to the jobs.

resource_pools:
  - name: common
    network: private
    stemcell:
      name: bosh-google-kvm-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      zone: us-east-1
      machine_type: n1-standard-1
      root_disk_size_gb: 20
      root_disk_type: pd-standard
      ephemeral_disk_type: local-ssd # <--------------

that way anyone whos using /var/vcap/data/ will automatically take advantage of this. even though it sounds like gcp offers multiple local disks i dont think it would be commonly used by the jobs at this point.

evandbrown commented 6 years ago

@cppforlife That's my understanding. @d, will an ephemeral volume at /var/vcap/data fit your needs?

From what I can tell, we won't need to modify the agent, just understand some of these options. Does that sound right, @cppforlife? If so, this is straightforward and I'd like one of our new team members to take this as a good CPI intro.

cppforlife commented 6 years ago

From what I can tell, we won't need to modify the agent, just understand some of these options https://github.com/cloudfoundry/bosh-agent/blob/a4179dc659aa7c068540c2c445102bf1542a7180/settings/settings.go#L165-L179. Does that sound right, @cppforlife https://github.com/cppforlife? If so, this is straightforward and I'd like one of our new team members to take this as a good CPI intro.

yup.

On Thu, May 31, 2018 at 8:17 PM, Evan Brown notifications@github.com wrote:

@cppforlife https://github.com/cppforlife That's my understanding. @d https://github.com/d, will an ephemeral volume at /var/vcap/data fit your needs?

From what I can tell, we won't need to modify the agent, just understand some of these options https://github.com/cloudfoundry/bosh-agent/blob/a4179dc659aa7c068540c2c445102bf1542a7180/settings/settings.go#L165-L179. Does that sound right, @cppforlife https://github.com/cppforlife? If so, this is straightforward and I'd like one of our new team members to take this as a good CPI intro.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cloudfoundry-incubator/bosh-google-cpi-release/issues/152#issuecomment-393746402, or mute the thread https://github.com/notifications/unsubscribe-auth/AALV91hc0L5vQMrKmSbLO1JV_TSo1Tpxks5t4LJUgaJpZM4MA8pf .

d commented 6 years ago

@cppforlife That's my understanding. @d, will an ephemeral volume at /var/vcap/data fit your needs?

  1. At a minimum, I'd like /var/vcap/data be on local SSD "when possible".
  2. I didn't quite appreciate the prior experience with the AWS CPI when we needed to be careful about putting the right number into ephemeral_disk_size, but I believe that's tolerable.
  3. Given the current pressing use case is for the base layer of containers (sans volumes) -- which is very ephemeral in nature -- I strongly resent the UX of raw_instance_storage: true because it requires releases to take on a EC2 dependency to take advantage of it. If you ever plan to "make all local SSD available", I'd appreciate to have all the capacity under /var/vcap/data like, magically. (This might entail some agent work)
d commented 6 years ago

@evandbrown if the goal is to use SSD for ephemeral data, then i think we want to have this a bit more transparent to the jobs.

resource_pools:
  - name: common
    network: private
    stemcell:
      name: bosh-google-kvm-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      zone: us-east-1
      machine_type: n1-standard-1
      root_disk_size_gb: 20
      root_disk_type: pd-standard
      ephemeral_disk_type: local-ssd # <--------------

that way anyone whos using /var/vcap/data/ will automatically take advantage of this. even though it sounds like gcp offers multiple local disks i dont think it would be commonly used by the jobs at this point.

@cppforlife How much do we want to keep ephemeral_disk_size both required and an integer? Doesn't your proposed ephemeral_disk_type: local-ssd conflicts with that a little?

bosh-admin-bot commented 3 years ago

This issue was marked as Stale because it has been open for 21 days without any activity. If no activity takes place in the coming 7 days it will automatically be close. To prevent this from happening remove the Stale label or comment below.