glandium / vmfs-tools

http://glandium.org/projects/vmfs-tools/
GNU General Public License v2.0
76 stars 29 forks source link

df result mismatch #5

Closed Thomas-Tsai closed 12 years ago

Thomas-Tsai commented 12 years ago

Hi,

We are clonezilla/partclone developer, and try to use library of vmfs to clone full vmfs partition. Basically, partclone will check bitmap and find all used block which will be backup. Lately, we found some (vmfs5) clone/restore failure, and try to do more debug. I found a strange result of df from debugvmfs and esxi :

ESXi: ~ # df Filesystem Bytes Used Available Use% Mounted on VMFS-5 37580963840 3346006016 34234957824 9% /vmfs/volumes/datastore1 df -m Filesystem 1M-blocks Used Available Use% Mounted on VMFS-5 35840 3191 32649 9% /vmfs/volumes/datastore1

vmfs-tools/debugvmfs ./debugvmfs/debugvmfs /dev/sdb3 df Block size : 1048576 bytes Total blocks : 35840 Total size : 35840 MiB Allocated blocks : 2990 Allocated space : 2990 MiB Free blocks : 32850 Free size : 32850 MiB

The df show me total block(35840) is right, but Allocated blocks/space are different(3191>2990)!

Partclone based on vmfs-tools, and only backup 2990 blocks, so get clone/restore failure. We try to follow vmfs-fsck .c and debugvmfs.c to dump all blk_id, position and size from bitmap (fbb, fdc, pbc and sbc), still get same result.

We are confuse for the situation, could you check why they are different?

I also do some study from VMware KB and found the new bitmap file named .pb2.sf, it is...

glandium commented 12 years ago

vmfs_fuse_statfs and cmd_df use the same functions and structs to get the data to report, so i really don't see why that would be different.

As for .pb2.sf, my hypothesis was that it was used for doubly indirect block lists, but I've never managed to have a vmfs5 fs actually using it: doubly indirect block lists were always in .pb.sf.

Thomas-Tsai commented 12 years ago

exactly, vmfs_fuse_statfs and cmd_df are same! My issue is different df result between "VMware ESXi5 Server shell" and linux vmfs-tools.

There are my steps to get different df result: I install VMware ESXi5 Server with shell and ssh enabled, copy some files to ESXi5 Server datastore1(vmfs5) by scp. After file copied, login to ESXi5 Server and run df -m.

reboot ESXi5 Server and boot from linux LiveCD(I use clonezilla-live), get vmfs-tools source from git repository and build it, run debugvmfs /dev/sda3 df.

The used data is smaller then ESXi server, I guess some used blocks are missing ?!

glandium commented 12 years ago

Ah, I see. Does that happen on vmfs3 as well or is it limited to vmfs5? Can you try on a zeroed-out partition, and take an image with the imager tool from vmfs-tools (imager /dev/device > image.file) and put the file somewhere I can pick it?

glandium commented 12 years ago

Note that you can ping me on irc (oftc or mozilla, nick: glandium)

glandium commented 12 years ago

Sorry it took so long.

I finally took a look at your image, and all I can say is that esx itself seems to be wrong. There is nothing in the data structures of the file system that would indicate more than 2990 blocks allocated.

Here's a little VMFS 101: (for vmfs 3, but vmfs 5 shouldn't be very different)

So from the above, I don't see why ESX would believe there is 210MB more allocated. I was ready to say that maybe it's also adding the subblocks, but there aren't any in use. I'm tempted to say ESX is wrong, here.

Now, that doesn't really help know why your backup restore fails, but I doubt it fails with the image you gave me for this issue. I'll thus close.

Thomas-Tsai commented 12 years ago

got it. I try to re-write new back tool, almost update from vmfs-fsck.c and follow to dump each bitmap. I think it cloud work fine, it's still in test.

Thank You!~