dmacvicar / terraform-provider-libvirt

Terraform provider to provision infrastructure with Linux's KVM using libvirt
Apache License 2.0
1.59k stars 458 forks source link

Check custom volume size is a multiple of 512 bytes #796

Open jamonation opened 3 years ago

jamonation commented 3 years ago

This is a naive attempt at working around? #741 - Size of created volume differs from the resource definition. The issue is that qcow2 uses sector sizes of 512 bytes, so any custom size is rounded up to align to a sector boundary, which leads to the resource destroy/create cycle in the issue.

The approach I've taken here is rather naive (just bail if custom size modulo 512 is non-zero), but it keeps things simple as opposed to fudging tfstate or reported values from qemu.

Here's an example setup:

terraform {
 required_version = ">= 0.13"
  required_providers {
    libvirt = {
      source  = "dmacvicar/libvirt"
      version = "0.6.2"
    }
  }
}

provider "libvirt" {
  uri = "qemu:///system"
}

resource "libvirt_volume" "test-volume" {
  name = "test-volume"
  size = 10000000
}

And a demonstration of rejecting outright an invalid size:

terraform apply  

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # libvirt_volume.test-volume will be created
  + resource "libvirt_volume" "test-volume" {
      + format = (known after apply)
      + id     = (known after apply)
      + name   = "test-volume"
      + pool   = "default"
      + size   = 10000000
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

libvirt_volume.test-volume: Creating...

Volume size 10000000 is not a multiple of 512 bytes. This would lead to volume replacement and data loss at the next apply invocation.

  on main.tf line 16, in resource "libvirt_volume" "test-volume":
  16: resource "libvirt_volume" "test-volume" {

I had a look through the test cases for volumes, but the testing framework seems too advanced for my novice level. I'll happily add one though if it makes sense!

jamonation commented 3 years ago

The error string had a trailing ., which the linter didn't like. Fixed it up, and made the error message more concise.

dmacvicar commented 3 years ago

Alternatively we could try to take advantage that the size attribute is marked as computed, because for volumes loaded from URLs (source attribute) we read size from the given file. This gives us a chance to modify the value.

qemu-img rounds the value:

S size indicates the consecutive number of bytes that must contain only zeros for qemu-img to create a sparse image during conversion. This value is rounded down to the nearest 512 bytes. You may use the common size suffixes like "k" for kilobytes.

If we want to avoid erroring, we would need:

This would make the code a bit complicated, but would prevent the user from taking a calculator each time.

What is your view on this?

MalloZup commented 3 years ago

@dmacvicar yes to me sounds good path.

@jamonation do you want to drive this feature and me with Duncan can help you with the needed guidance?

What do you think?

The rationale of the feature should be https://github.com/dmacvicar/terraform-provider-libvirt/pull/796#issuecomment-725344046

jamonation commented 3 years ago

Sure, since it sounds like you're up for helping me out, I'm game!

I think the real complexity will be in the diff suppression and figuring out the very specific cases where rounding has occurred, as opposed to others where the user intends an actual size change. So I'll focus on that and see what I can come up with.

Thanks for getting me on the right track with this!

jamonation commented 3 years ago

I've redone things here taking into account the qcow2 format, and rounding up constraints. I was wrong about the virtual size being multiples of 512 bytes - in my local testing at least, qemu-img shows images are multiples of 1024. I implemented a rounding up function taking that size into account.

The real logic then is this one conditional once all the format checking and rounding is done:

if newSizeRounded == rSize && format == "qcow2" {
                return true
}

Where rSize is the reported size from the current state, and newSizeRounded is exactly that, the requested size, rounded up to the nearest 1024.

If those conditions aren't met, then the diff will still get generated.

I'm sure there are a ton of naming and formatting conventions that I'm not aware of so please let me know what I should tidy or refactor and I'm happy to do it.

dmacvicar commented 3 years ago

Looks pretty good.

Would you be able to provide a unit test for the rounding function and a integration test for the functionality itself?

I am not sure how to specifically test suppression functions. I suspect it to relate to ExpectNonEmptyPlan, but I am not sure.

Additionally, the documentation would need to mention this behavior for the attribute.

scabala commented 2 weeks ago

closes #741

dmacvicar commented 2 weeks ago

I have rebased and reworked this PR and added an integration test.

I could not understand why @jamonation mentioned it is rounded to 1024:

 qemu-img create  -f qcow2 1.qcow2 1534
...

$ qemu-img info 1.qcow2
...
virtual size: 1.5 KiB (1536 bytes)
...

$ qemu-img create  -f qcow2 1.qcow2 1537
...

$ qemu-img info 1.qcow2
...
virtual size: 2 KiB (2048 bytes)
...

It looks it is always 512.

Some review and testing is appreciated.

scabala commented 2 weeks ago

@jamonation @bcrisp4 could you test it? I'll try to do it as well

scabala commented 2 weeks ago

I have performed basic tests. It is both good and bad:

So I think additional changes to round up to nearest 512 multiplication is needed to fix perpetual diff.

dmacvicar commented 2 weeks ago

Ok. I will need to reproduce that with an acceptance test. @scabala can you detail the changes you tried?

dmacvicar commented 2 weeks ago

I am having trouble reasoning about this.

My testcase has an initial value of 1073742300, which I expect to be rounded by qemu-img to 1073742336:

$ qemu-img create -f qcow2 1.qcow2 1073742300
...
$ qemu-img info 1.qcow2
image: 1.qcow2
file format: qcow2
virtual size: 1 GiB (1073742336 bytes)
...

Interestingly the rounding is done before qemu-img, by libvirt:

https://github.com/libvirt/libvirt/blob/d2dd209cdde937c69f691b8ca9f34ef1382172fa/src/storage/storage_util.c#L2347

/* Round capacity as qemu-img resize errors out on sizes which are not
     * a multiple of 512 */
    capacity = VIR_ROUND_UP(capacity, 512);

Which is:

/* divide value by size, rounding up */
#define VIR_DIV_UP(value, size) (((value) + (size) - 1) / (size))

/* round up value to the closest multiple of size */
#define VIR_ROUND_UP(value, size) (VIR_DIV_UP(value, size) * (size))
$ python
>>> (((1073742300 + 512) - 1 ) // 512) * 512
1073742336

However:

    resource_libvirt_volume_test.go:134: Step 1/2 error: Check failed: Check 3/3 error: libvirt_volume.exbbbooxdq: Attribute 'size' expected "1073742336", got "1073742848"

Which is the rounding to 1024! :thinking:

scabala commented 2 weeks ago

Hi @dmacvicar sure, I first created volume and if I then changed it by less than 512 - no diff, as expected. Then I tried to change it by something more than 512 but not a multiply of it, 700 in my example. As expected, diff was shown and applied successfully. But every next apply showed diff when actual size was rounded to nearest 512 multiplication.

jamonation commented 2 weeks ago

Wow this is a blast from the past! I don't have a local setup here to test with but will see what I can do to reproduce. I definitely recall the 512 vs. 1024 thing throwing me off.

If I can get the provider going locally in debug mode, I guess we'll want to validate and add tests for the following qcow2 sizes/changes:

  1. volume create with size n*512 a. verify the created volume size matches
  2. volume create with size n*512-1 a. verify the created volume size is a multiple of 512 b. should see no diff on replanning
  3. volume create with size n*512-1, then change size by -1 a. verify the changed volume is a multiple of 512 b. should see no diff on replanning

I think these all cover behaviour where we see things working as expected. Next to narrow things down, what happen when we try:

  1. volume create with size n*512+1 a. verify the created volume size is a multiple of 512 (should be n*512+512)? b. should see no diff on replanning
  2. volume create with size n*512+1, then change size by -1 a. replanning should bring size back to n*512+512 b. should see no diff on replanning again
  3. volume create with sizen*512+1, then change size by +1 a. replanning should increase size to n*512+512+512 b. should see no diff on replanning again. This is the case where you're seeing infinite replanning right @scabala?

Do we know if case 5. behaves the same way when shrinking a volume? Part of me thinks this is a completely different code path since the volume won't get sized down by libvirt, whereas sizing up means a volume resize and different set of functions in the provider, terraform, and libvirt.