CCI-MOC / hil

Hardware Isolation Layer, formerly Hardware as a Service
Apache License 2.0
24 stars 54 forks source link

HaaS should have a method of wiping disks #105

Closed henn closed 6 years ago

henn commented 10 years ago

Sharing hardware between tenants means that we should try to reset the state of each node each time it becomes free, so that the next time it is allocated there is no state carried over.

To do this, we should wipe disks. This could be done by PXE booting into a small, in-memory OS and running some logic.

Bonus points for using SATA's secure erase.

The Ironic python agent may solve some/all of this.

okrieg commented 10 years ago

If we are switching from one student experiment to another, do we really want to incur the cost to wipe disks? If we are running an service that doesn't use the disk,(e.g. that uses network mounted storage) do we really need to wipe the disk after that service completes? If we are running a service that encrypts all the bits it puts on the disk, do we still need to wipe the disk?

I think the cost to wipe the disks should be incurred by the person giving up the machine, and they should be paying for the time to do it. If they don't choose to do it, then tough luck. We could provide a service that they could invoke to wipe the disks, but thats just a convenience.

zenhack commented 10 years ago

Those disks are big.

I'm with @okrieg, If the user cares about wiping the disk, they should do it themselves. an add-on service is a possibility, but beyond the scope of the HaaS I think. I'll wait until tomorrow night for other comments, and then I'll just close this.

henn commented 10 years ago

I think we need to think about this from the end-user's perspective rather than the developer's.

We are billing the HaaS as an isolation layer, and I think it would come as a surprise to some users that an isolation layer pays no attention to isolating any leftover data on the disk. There should at least be an option for it. Even users who don't think they left anything behind due to using network-based storage might inadvertently have (like swap space). In addition, friendly users like grad students might leave the disk in a state that can mess up future users, like when an LVM on-disk configuration can cause an automated install to fail because volume errors.

While some users will be in marketplace-driven datacenters (where people care about who pays for the time, or perhaps someone knows for sure that they didn't write to disk), some could be in enterprises where security is more important than getting the beans appropriately charged.

Who should pay for the overhead is a different question than whether HaaS should have this functionality.

Please note that there are several potential optimizations, too.

  1. We could take advantage of built-in disk crypto to set a random password, then set a new password for the next user. Caveat is that some disk crypto has serious flaws.
  2. Could try to optimize the disk writing itself by taking advantage of Native Command Queueing to send multiple write requests to match the number of write heads that can operate simultaneously (thus each rotation writes X*(N write heads) bytes/rotation instead of X bytes/rotation).
  3. Is the secure erase command any more optimized than just writing the bytes from the host?
zenhack commented 10 years ago

There are certainly optimizations worth looking at if we do decide to build such a tool, but I don't think it's likely any of them are going to have enough of an impact to really affect the arguments for/against.

I think something like this might indeed be useful, but I'm very wary of letting new functionality creep into the HaaS; it is intended to be a very low-level tool, and this feels like something the HaaS shouldn't know or care about.

There's a broader discussion to be had around what happens when a user is done with resources; we don't really have a defined way of giving nodes back (we don't have group_{connect,detach}_node, which seems like a glairing omision). Let's spend some time pinning that architecture down (and the query interface!), and keep this in mind as we do so. I don't think it will be that much of a stretch to design things such that this can be built on top without any trouble.

okrieg commented 10 years ago

Am free from 3 on for architectural discussion on query interface...

I think that Jay convinced me in previous point that we should at least have this on our radar, so lets just keep it as a bug for now. The point that convinced me is if we have HW to partition disks and encrypt partititions, the HaaS may have a very simple service to coordinate who gets which partition.
On Jun 23, 2014, at 1:08 AM, Ian Denhardt notifications@github.com wrote:

There are certainly optimizations worth looking at if we do decide to build such a tool, but I don't think it's likely any of them are going to have enough of an impact to really affect the arguments for/against.

I think something like this might indeed be useful, but I'm very wary of letting new functionality creep into the HaaS; it is intended to be a very low-level tool, and this feels like something the HaaS shouldn't know or care about.

There's a broader discussion to be had around what happens when a user is done with resources; we don't really have a defined way of giving nodes back (we don't have group_{connect,detach}_node, which seems like a glairing omision). Let's spend some time pinning that architecture down (and the query interface!), and keep this in mind as we do so. I don't think it will be that much of a stretch to design things such that this can be built on top without any trouble.

— Reply to this email directly or view it on GitHub.

ianballou commented 6 years ago

The maintenance service will now do this. #891