delphix / appliance-build

This repository contains the code used to build the Ubuntu-based Delphix Appliance, leveraging open-source tools such as Debian's live-build, Docker, Ansible, OpenZFS, and others.
Apache License 2.0
19 stars 41 forks source link

DLPX-87038 VMDK file size differs from the size in the OVF file #732

Closed palash-gandhi closed 1 year ago

palash-gandhi commented 1 year ago

Problem

`qemu-img` seems to be returning a different size for a VMDK. ``` delphix@pg-release:~$ aws s3 cp s3://snapshot-de-images/builds/jenkins-ops/appliance-build/release/post-push/48/external-standard-esx/external-standard-esx.ova . download: s3://snapshot-de-images/builds/jenkins-ops/appliance-build/release/post-push/48/external-standard-esx/external-standard-esx.ova to ./external-standard-esx.ova delphix@pg-release:~$ tar -xvf external-standard-esx.ova external-standard-esx.ovf external-standard-esx.mf external-standard-esx.vmdk delphix@pg-release:~$ stat -c %s external-standard-esx.vmdk 7058465280 delphix@pg-release:~$ grep ovf:size external-standard-esx.ovf delphix@pg-release:~$ qemu-img info --output=json external-standard-esx.vmdk | grep actual-size "actual-size": 7062785536, ``` This caused [ESCL-4527](https://delphix.atlassian.net/browse/ESCL-4527)

Solution

Use `stat` instead to ensure sizes match.

Testing Done

Run `ab-pre-push`, download resulting OVAs and verify that sizes match ``` delphix@pg-release:~$ aws s3 cp s3://dev-de-images/builds/jenkins-selfservice/appliance-build/develop/pre-push/880/external-standard-esx/external-standard-esx.ova . download: s3://dev-de-images/builds/jenkins-selfservice/appliance-build/develop/pre-push/880/external-standard-esx/external-standard-esx.ova to ./external-standard-esx.ova delphix@pg-release:~$ tar -xvf external-standard-esx.ova external-standard-esx.ovf external-standard-esx.mf external-standard-esx.vmdk delphix@pg-release:~$ grep ovf:size external-standard-esx.ovf delphix@pg-release:~$ stat -c %s external-standard-esx.vmdk 7096252928 ``` Uploaded the OVA to vcenter and deployed an engine from it: https://qa-esxi01.delphix.com/ui/#/host/vms/1429
prakashsurya commented 1 year ago

Use stat instead to ensure sizes match.

Can you clarify what we want to match?

In the example:

delphix@pg-release:~$ stat -c %s external-standard-esx.vmdk
7058465280

delphix@pg-release:~$ grep ovf:size external-standard-esx.ovf
    <File ovf:href="external-standard-esx.vmdk" ovf:id="file1" ovf:size="7058241024"/>

delphix@pg-release:~$ qemu-img info --output=json external-standard-esx.vmdk | grep actual-size
    "actual-size": 7062785536,

none of these values are the same.. 7058465280 vs. 7058241024 vs. 7062785536 .. which has me confused, and not sure what we're trying to accomplish..

palash-gandhi commented 1 year ago

Use stat instead to ensure sizes match.

Can you clarify what we want to match?

In the example:

delphix@pg-release:~$ stat -c %s external-standard-esx.vmdk
7058465280

delphix@pg-release:~$ grep ovf:size external-standard-esx.ovf
    <File ovf:href="external-standard-esx.vmdk" ovf:id="file1" ovf:size="7058241024"/>

delphix@pg-release:~$ qemu-img info --output=json external-standard-esx.vmdk | grep actual-size
    "actual-size": 7062785536,

none of these values are the same.. 7058465280 vs. 7058241024 vs. 7062785536 .. which has me confused, and not sure what we're trying to accomplish..

Yeah that has confused me too. I am currently running some experiments. The problem here is that qemu-img seems to be returning 2 different sizes.. I'll update the PR once my tests finish.

palash-gandhi commented 1 year ago

@prakashsurya I am not sure why qemu-img reports 2 different sizes, but depending on stat makes more sense to me. I tried looking if this was a known issue with qemu-img but did not find anything. Did you want me to continue looking into it? Do we care why qemu-img reports a different size? Irrespective of the answer, my guess we will still want to use stat instead right.

prakashsurya commented 1 year ago

Are we sure that using stat will work? it seems like qemu-img returned different values at different times and/or on different machines.. so I'm curious if we're confident that stat will not do this? if so, where's that confidence come from?

Also.. do we know how VMware is checking the VMDK size, such that it can detect when it's different than what's in the OVF?

palash-gandhi commented 1 year ago

@prakashsurya

Are we sure that using stat will work? it seems like qemu-img returned different values at different times and/or on different machines.. so I'm curious if we're confident that stat will not do this? if so, where's that confidence come from?

My tests revealed that qemu-img was returning different values at different times but stat was not. I have not found any upstream issues against qemu-img that would explain the differences but since the values reported by stat did not change at different times and/or on different machines, I figured it's best to switch to using stat.

Also.. do we know how VMware is checking the VMDK size, such that it can detect when it's different than what's in the OVF?

We do not. Since we do not have a vCloud Director account/subscription I cannot test this internally.

prakashsurya commented 1 year ago

I'll approve this.. but I'm also curious if we could just remove the ovf:size field from the template, altogether.. and not have to worry about this size mismatch at all.. e.g.

The size of the referenced file may be specified using the ovf:size attribute. The unit of this attribute is always bytes. If present, the value of the ovf:size attribute shall match the actual size of the referenced file.

from here: https://www.dmtf.org/sites/default/files/standards/documents/DSP0243_1.1.0.pdf

could we just remove the ovf:size field from here:

 19   <References>
 20     <File ovf:href="@@VMDK_FILENAME@@" ovf:id="file1" ovf:size="@@VMDK_FILESIZE@@"/>
 21   </References>

?

I'll approve this as-is.. but it still concerns that maybe we're not really fixing the problem..