Closed oppegard closed 3 years ago
Hi @oppegard,
Local SSD isn't currently supported, but I think I'm pretty familiar with what it would take to add it. We can't define local SSDs in disk_pools
because they can't be created independently of a VM. I believe we'd want to add it to the cloud_properties
of a VM. That would look something like:
resource_pools:
- name: common
network: private
stemcell:
name: bosh-google-kvm-ubuntu-trusty-go_agent
version: latest
cloud_properties:
zone: us-east-1
machine_type: n1-standard-1
root_disk_size_gb: 20
root_disk_type: pd-standard
local_ssd:
count: 4
interface: nvme
The only thing configurable about local SSD is the # you want to attach (the maximum varies for each instance type), as well as the interface (nvme or scsi).
Getting this in the CPI is easy stuff, and I only see two potential issues:
bosh_agent
discover, format, and mount these disks correctly? If not, that's an upstream patch.@cppforlife this look OK to you?
@evandbrown that makes sense regarding local SSD being part of the VM definition. I think we'll need support for this before we can migrate our CI to GCP. Our C3 workers often spike up to 150K write IOPS -- I'm looking forward to faster CI builds on a GCP instance with NVMe and those 360K write IOPS. 😁
I think it would make sense for the bosh agent to mount the (partitioned) local SSD at /var/vcap/data
. I reckon formatting with ext4 as a first pass, unless there is a need for specifying the fs type in cloud_properties.
@evandbrown would you be interested in a PR that added this feature? There may well be an associated PR for the bosh-agent, which I would submit before posting this one.
@craigfurman yes! that would be awesome
@craigfurman - any update on the upstream changes? happy to work with you to get this into the CPI
wouldnt it prevent live migration? i could see it as an option but not necessarily a default. what was the use case (which app) for exposing it?
On Tue, Jan 23, 2018 at 6:45 PM, Jeff Johnson notifications@github.com wrote:
@craigfurman https://github.com/craigfurman - any update on the upstream changes? happy to work with you to get this into the CPI
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cloudfoundry-incubator/bosh-google-cpi-release/issues/152#issuecomment-360003036, or mute the thread https://github.com/notifications/unsubscribe-auth/AALV97cuZTE3gU9kQhTdTOsjazloEAU3ks5tNplWgaJpZM4MA8pf .
Data is retained on live migration (src)- but I'd bet perf goes down for a bit.
IMO: The use case is for applications that need to write out data that's not very important. So a CI doing builds, a cache server, or something along those lines. I'd be apprehensive about saying 'swap this flag and all of your bosh apps are writing to a local SSD' without further investigation.
@johnsonj I started working on it briefly, but dropped it for a while after some other stuff came up. I'm not likely to get back to it any time soon.
I'll post some limited context here though, in case I or someone else picks this up:
Proposed interface: on the cloud_properties
of a VM:
local_ssds:
interface: <nvme or scsi>
count: <count>
At the time of writing, the SSDs have a fixed size of 375GB. If more than one is desired, we're in uncharted territory when it comes to bosh UX (as far as I know). Normally, bosh formats disks with ext4 and mounts them at /var/vcap/store or /var/vcap/data (although XFS is now available as for persistent disks).
I can think of 3 options:
There may be other options. What do you think?
I'd start with option 1 and see if there's appetite for option 3. Is it fair to say /var/vcap/data
always lives on a local disk and is always tied to the instance vs. /var/vcap/store
is always a persistent disk that can be re-attached elsewhere? I want to make sure the semantics of local SSD match what a BOSH release author would expect.
Hi our team kinda desperately needs this too (either that or we're considering moving back to AWS). Option 1 mentioned by @craigfurman sounds like the path of least resistance: the Agent code is probably mostly in place, and we "just" need to change the CPI code I guess?
Is this CPI mostly maintained by Googlers?
Hi Jesse,
I’ll look into this in the next few days and reply with a proposal and time frame. More to come...
Thanks,
Evan
On Mon, May 21, 2018 at 3:30 PM Jesse Zhang notifications@github.com wrote:
Hi our team kinda desperately needs this too (either that or we're considering moving back to AWS). Option 1 mentioned by @craigfurman https://github.com/craigfurman sounds like the path of least resistance: the Agent code is probably mostly in place, and we "just" need to change the CPI code I guess?
Is this CPI mostly maintained by Googlers?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/cloudfoundry-incubator/bosh-google-cpi-release/issues/152#issuecomment-390802440, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoGLQWRn9-bMRCf9GOhVKxG4hWzxx13ks5t0z_2gaJpZM4MA8pf .
@d, it sounds like you've got experience using ephemeral disks with EC2 so I'm curious if you've got an opinion wrt how this would be surfaced in the Google CPI?
There were a few ideas floated earlier in the thread. What if we allow them to be specified as @craigfurman suggested:
cloud_properties:
local_ssd:
interface: (nvme|scsi)
count: (1-8)
then let BOSH handle them the same way the raw_instance_storage
property of the AWS CPI does, i.e. "With multiple disks attached, the Agent partitions and labels instance storage disks with label raw-ephemeral-* so that release jobs can easily find and use them"
Appreciate your input on how this can be implemented in a way that's useful for you.
@evandbrown if the goal is to use ssh for ephemeral data, then i think we want to have this a bit more transparent to the jobs.
resource_pools:
- name: common
network: private
stemcell:
name: bosh-google-kvm-ubuntu-trusty-go_agent
version: latest
cloud_properties:
zone: us-east-1
machine_type: n1-standard-1
root_disk_size_gb: 20
root_disk_type: pd-standard
ephemeral_disk_type: local-ssd # <--------------
that way anyone whos using /var/vcap/data/ will automatically take advantage of this. even though it sounds like gcp offers multiple local disks i dont think it would be commonly used by the jobs at this point.
@cppforlife That's my understanding. @d, will an ephemeral volume at /var/vcap/data
fit your needs?
From what I can tell, we won't need to modify the agent, just understand some of these options. Does that sound right, @cppforlife? If so, this is straightforward and I'd like one of our new team members to take this as a good CPI intro.
From what I can tell, we won't need to modify the agent, just understand some of these options https://github.com/cloudfoundry/bosh-agent/blob/a4179dc659aa7c068540c2c445102bf1542a7180/settings/settings.go#L165-L179. Does that sound right, @cppforlife https://github.com/cppforlife? If so, this is straightforward and I'd like one of our new team members to take this as a good CPI intro.
yup.
On Thu, May 31, 2018 at 8:17 PM, Evan Brown notifications@github.com wrote:
@cppforlife https://github.com/cppforlife That's my understanding. @d https://github.com/d, will an ephemeral volume at /var/vcap/data fit your needs?
From what I can tell, we won't need to modify the agent, just understand some of these options https://github.com/cloudfoundry/bosh-agent/blob/a4179dc659aa7c068540c2c445102bf1542a7180/settings/settings.go#L165-L179. Does that sound right, @cppforlife https://github.com/cppforlife? If so, this is straightforward and I'd like one of our new team members to take this as a good CPI intro.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cloudfoundry-incubator/bosh-google-cpi-release/issues/152#issuecomment-393746402, or mute the thread https://github.com/notifications/unsubscribe-auth/AALV91hc0L5vQMrKmSbLO1JV_TSo1Tpxks5t4LJUgaJpZM4MA8pf .
@cppforlife That's my understanding. @d, will an ephemeral volume at
/var/vcap/data
fit your needs?
/var/vcap/data
be on local SSD "when possible".ephemeral_disk_size
, but I believe that's tolerable.raw_instance_storage: true
because it requires releases to take on a EC2 dependency to take advantage of it. If you ever plan to "make all local SSD available", I'd appreciate to have all the capacity under /var/vcap/data
like, magically. (This might entail some agent work)@evandbrown if the goal is to use SSD for ephemeral data, then i think we want to have this a bit more transparent to the jobs.
resource_pools: - name: common network: private stemcell: name: bosh-google-kvm-ubuntu-trusty-go_agent version: latest cloud_properties: zone: us-east-1 machine_type: n1-standard-1 root_disk_size_gb: 20 root_disk_type: pd-standard ephemeral_disk_type: local-ssd # <--------------
that way anyone whos using /var/vcap/data/ will automatically take advantage of this. even though it sounds like gcp offers multiple local disks i dont think it would be commonly used by the jobs at this point.
@cppforlife How much do we want to keep ephemeral_disk_size
both required and an integer? Doesn't your proposed ephemeral_disk_type: local-ssd
conflicts with that a little?
This issue was marked as Stale
because it has been open for 21 days without any activity. If no activity takes place in the coming 7 days it will automatically be close. To prevent this from happening remove the Stale
label or comment below.
Is is possible to use Local SSD for the ephemeral disk mounted at
/var/vcap/data
? I could only find references topd-ssd
in the examples, which I assume is shorthand for persistent SSD:Our present Concourse CI workers are provisioned as AWS C3 instance types, which provide local SSD and drastically better performance for our workloads. We'd like to achieve the same on GCP if possible.