MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.89k stars 498 forks source link

software out of control -> full disk on a VM machine #3755

Closed Phil1988 closed 4 years ago

Phil1988 commented 4 years ago

Required Information

DietPi version | cat /boot/dietpi/.version G_DIETPI_VERSION_CORE=6 G_DIETPI_VERSION_SUB=31 G_DIETPI_VERSION_RC=2 G_GITBRANCH='master' G_GITOWNER='MichaIng'

Distro version | echo $G_DISTRO_NAME or cat /etc/debian_version 9.12

Kernel version | uname -a Linux DietPi 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

SBC model | echo $G_HW_MODEL_NAME or (EG: RPi3) Virtual Machine (x86_64)

Additional Information (if applicable)

I am not sure if the information above is 100% true because I cant start the VM any more.

My problem is, that I currently cant start the VM with all my servers are installed (nextcloud, blynk etc...). VMware gives an error, that "The files system where disk .... resides is full".

I have a 1,6TB HDD where nothing else but the VM is saved. You can find some information here: grafik As far as I remember the virtual machine is "maped" to the the local volume.

I had the problem about a year ago. Back then it was the trashbin of nextcloud that spammed the disk completely full. I could solve this by copying the whole virtual machine to an external drive, start the vm from there and delete the trashbin via sudo -u www-data php /var/www/nextcloud/occ trashbin:cleanup --all-users.

Last week it happened again. As I dont have an big enough external drive left I had to delete some snapshots, could start the VM, put nextcloud to maintenance mode and cleaned the trashbin. I did run ncdu to check the current size and it showed about 360GB. I know it would be better to run ncdu before I empty the trashbin to get to know what caused this issue, but the system got full so fast, that my first attempt to run ncdu aborted because the disk got full before finishing. So I deleted 2 more snapshots, started the VM and immediatly run the thashbin cleanup command. It worked 2 more days until the problem appeared again.

Sadly I am unable to delete more snapshots. VM tells me "A general system error occured: vim.fault.GenericVmConfigFault." Even if I could and solve the problem, it seems to come back in the near future.

To summarize this: I am guessing that nextcloud is the problem here. I am not sure, as I am unable to start the system and check it for sure.

If so, there seems to be a problem on nextcloud because it normally should not use this much space for the trashbin. Maybe its also sourced by the pictures preview generation. I have a cron installed that runs: */15 * * * * php -f /var/www/nextcloud/occ preview:pre-generate

As my knowledge is limited and the problem might come from a complete different direction I hope to get some help here to prevent this in the future.

Normally the system should be able to destroy itself :D

A couple of years ago I used a raspberry pi without dietpi but with nextcloud and linked the user data storage to a ntfs external drive. This was a great benefit as I was always able to check the files and had access to it - even if the filesystem/server had an issue. It would be gread if something similar is possible with a VM and dietpi.

Any help and ideas are highly appreciated

Joulinar commented 4 years ago

Hi,

not sure if this is possible as I don't use VMware. But can you create an additional new VM (simple Linux box) and attach the current large disk as an additional drive? Once booted it should be possible to check what kind of data is eating the space.

As far as I can see, your phys disk seems to be full as well. Just 28KB free space left? On Windows you could use tools like TreeSize, to check what is using the disk space. Would looks like this.

picture

MichaIng commented 4 years ago

The disk space of the guest is not the issue, I think, just 61 GB are used as of your screenshot. You can manually check the DietPi_WMware-x86_64-Stretch.vmdk (or similar) file size to be sure, although if the physical disk is mapped, not sure how this looks like. Also the VM would boot up fine even if its virtual disk space would be full, print some error messages, but you would be able to operate from inside. Since (as far as I understood) the VMware software itself throws the error, I also guess the host disk (external drive) is the issue, probably something else got stored there, you said snapshots (although those need to be made manually as VMware workstation player has no internal snapshot functionality, right?) or the partition does is not expended above the whole drive or such?

TreeSize is definitely a nice tool to get some overview 😃.

Although it does not seem the issue, if you want to control Nextclouds trashbin (and e.g. file versioning) size, there are /var/www/nextcloud/config/config.php options to do so:

Especially if there are many/large files created, changed and removed regularly, it could make sense. By default files in trash and versions are kept for 30 days AND are automatically deleted when disk space is needed, but I am not sure how reliable Nextcloud estimates that disk space. Best is probably to apply a quota so limit overall Nextcloud space usage, it there is a risk that it fills the disk completely.

Phil1988 commented 4 years ago

@Joulinar : I dont know if and how I can attach the a disc like this in VMware Workstation.

I also made a screenshot of the files on this drive. You can see it here: grafik

I dont know where the 61GB in the screenshot above are linked to. Maybe its just the DietPi_VMware-x86_64-Stretch-000004.vmdk.

But the DietPi_VMware-x86_64-Stretch.vmdk file is really about 1,6TB.

I am using VMware Workstation which has the snapshot feature included. Its also used as a "shared VM", so that the VM can be automatically started at (windows) system start.

My (windows) server, that runs the dietpi VM is headless. If windows does a reboot because of an update or power failure, the VM gets started automatically. Its otherwise not possible as I searched for it.

Joulinar commented 4 years ago

can you check the disk configuration on this specific VM? The 1.6TB file was used on 15th of July. Do you know what happen this date? As well there are still a couple of 2GB snapshots from that date. Are you able to move them of them to a different disk?

MichaIng commented 4 years ago

AFAIK the snapshots are the currently written files while the 1.6 TB file is the base image. The problem with this is that WMware actually like to write the current VM state to the .nvram file (aside of logs and of course the snapshots) but does not have space for that anymore. I'm not sure if this is easily possible now, but generally I would use the dynamic disk allocation instead of reserving the whole 1.6 TB directly, especially sine the actual data just take the tiny fraction of 61 GB. See if it is possible to switch to dynamic allocation. WMware has an image compacting tool that can be used to shrink the image size then.

Joulinar commented 4 years ago

found this one. Not sure if it's working that way.

https://virtualman.wordpress.com/2016/02/24/shrink-a-vmware-virtual-machine-disk-vmdk/

Phil1988 commented 4 years ago

can you check the disk configuration on this specific VM? The 1.6TB file was used on 15th of July. Do you know what happen this date? As well there are still a couple of 2GB snapshots from that date. Are you able to move them of them to a different disk?

yes I know what happened on that date. I deleted all the snapshots to have only one system running and took a snapshot at this date to test serveral things as I had problems here: https://github.com/MichaIng/DietPi/issues/3675

@MichaIng: To revert to a dynamic disk could be indeed a bit tricky now. Im not sure, but I do remember, that it had something to do with the shared VM to use the map function instead of dynamic allocation.

But I am definetly sure, that the data is more than those 61GB My Nextcloud data alone is 251GB + other users which had also about 120GB. So there is more in use that what those 61TB want us to believe.

@Joulinar I will try now that shrinking thing, but I do remember, that shared VM doesnt allow it. I will check it and report back.

EDIT: None of these method can be used as I cant even start the VM.

I will go and buy an external HDD to get the VM running from there, locate the issue and report back what used the space... If you have additional ideas, please let me know.

But I am pretty sure, that really all space is used by the system. It was like this the year ago and I am sadly positive that its the same case this time.

Joulinar commented 4 years ago

I don't know but would it be an option to move snapshot 24/25.vmem to another (external) disk? Could it be some leftover? If I'm not mistaken, these are the paging files for the respective snapshot. If I understood right, you deleted all snapshot. So might be possible to remove the vmem files as well?

https://communities.vmware.com/thread/564465

Phil1988 commented 4 years ago

I am away over the weekend and will check this on next tuesday

MichaIng commented 4 years ago

Indeed it should be possible to remove all snapshots aside of 35, as this is the newest one, based on timestamps. Instead of removing it, you could move those to some backup location on your internal drive until everything has been fixed. Im not sure which vmdk file is associated to which snapshot, but since the *000004.vmdk one has the latest timestamp, I guess it is associated to the latest snapshot as well. Move all others (including the same-named lck directories) to your internal drive backup location as well. Then you have a few GiB fee to start the VM and check whether really the latest state is in use. If WMware fails to start because the wrong snapshots/vmdk's have been moved out, the error messages hopefully tells you which one it was expecting and you can move back or switch the required one from your backup location.

Phil1988 commented 4 years ago

So.... I just bought a new external HDD and copied the whole VM folder to it.

After starting this VM I put the nextcloud in maintenance mode and rund ncdu from the root directory ("/"). As we can see, the files are more than these 61GB but nothing close to the 1.6TB: grafik

The question now is, what is the issue here and how would you start to get to the source of this problem? Meanwhile I will also google and check if I can find something by myself...

EDIT: I found a promising line for the VM config (*.vmx file) to prevent the space check: mainMem.freeSpaceCheck = "FALSE" (source here) Sadly the error happend again: grafik

EDIT 2: I dont understand this article and I think this is not relevant for me, but maybe you can generate an idea with it :) http://wb-hk.blogspot.com/2016/01/while-sniffing-around-freebsd-ports.html

Joulinar commented 4 years ago

can you have a look to your VM configuration. There you should be able to see which disks are attached the the VM. Just to check if the 1.6TB file is really used.

Phil1988 commented 4 years ago

What do you mean by "VM configuration"?

If I right click the VM and hit "Setting" and go to the "hard disc" section you can see this here what I posted already above: grafik

Is there another section which shows more of that configuration?

You can also see, that the "current size" is not true as ncdu already shows 406,6GB in /mnt

Joulinar commented 4 years ago

Looks like disk 000004.vmdk is used only. 🤔

Just an idea, can you move the 1.6TB DietPi_VMware-x86_64-Stretch.vmdk file away and try to start your VM with having DietPi_VMware-x86_64-Stretch-000004.vmdk in place only?

Is there a difference between the config on "My Computer" and "Shared VMs"?? Are these 2 different VM's or the same?

MichaIng commented 4 years ago

I'm pretty sure that 000004.vmdk is a differential snapshot of the 1.6 TB file. On VirtualBox its like that: One base image and as fast as one creates a snapshot, a new image file is created that only contains the differences and hence is quite small.

@Phil1988

I found a promising line for the VM config (*.vmx file) to prevent the space check:

Uff, do not do that, it will potentially only break the VM completely, e.g. leaves the file in a corrupted state due to unexpectedly aborted file write or such. Proceed as I posted above, remove the files that are not related to the latest snapshot: https://github.com/MichaIng/DietPi/issues/3755#issuecomment-687128399 Also we know for sure now that 000004.vmdk is the snapshot that is currently used.

Joulinar commented 4 years ago

but than, within config option, disk file name should be shown as Stretch.vmdk and not as Stretch-000004.vmdk. isn't it?

MichaIng commented 4 years ago

Nope, 000004.vmdk is the file that is written to, hence this is what is practically attached to the VM. As fast as one creates a snapshot, the current image stops being written to but the differential image is attached and used.

I mean how it is shown is a question of the software, e.g. on VirtualBox it still shows the base image, but when you check which files are actually written, its the differential file. Also when I want to attach one VM image to another VM, I need to use the differential file to see the latest state.


Btw, generally it is possible to switch back and forth between pre-allocated and dynamically allocated vmdk size: https://www.howtogeek.com/313125/how-to-convert-between-preallocated-and-growable-disks-in-vmware/ When the VM finally boots after removing the obsolete snapshots (each 2 GiB so one should be sufficient already), I'd check if everything is fine and then start removing all snapshots (means merging them into the base file) which further reduces the used size. Then zerofree can be used to fill all empty space with zeros (that will take a looong time for 1.6 TB I guess) and then the vmdk first converted to dynamic allocation + compacted via WMware GUI.

Phil1988 commented 4 years ago

Well the Problem is that I do not only want to make it work, but also to make it not happen again :D If I delete (move) the files listed, than it would probably work but my snapshots will be gone and I wont be able to copy all the files back again right?

Another question for me is: Can I just delete the snapshots as mentioned? Is every snapshot only linked to the "root" vmdk or are they differential from each other?

like root -> snapshot 1 -> snapshot 2 ->...

or root -> snapshot 1 root -> snapshot 2

?

I will now make another copy of my files and delete all unused snapshots and start the VM. But I currently dont know how to "merge" the "deleted" snapshots after this...

MichaIng commented 4 years ago

At least you can copy the files to a different drives as backup as stated. To be even able to check and fix anything, you need space.

Indeed possible (actually likely) that these are snapshots of each other. But doesn't the WMware workstation GUI show that? I saw a screenshot while searching for the static <> dynamic conversion where one has a full overview of all snapshots and can delete or create them. I only have the free WMware workstation player that has no snapshot feature so cannot check that 😉.

I will now make another copy of my files and delete all unused snapshots and start the VM.

Jep if there is no good GUI then trial and error will do it 😉. Copying 2 GiB files back and forth should be quick enough for that.


EDIT: If moving away the snapshot vmdk files indeed breaks boot (i.e. those are snapshots of each other), then removing the .vmem files should be definitely possible. Key should be to be able to boot the VM once to see if everything is fine an up to date (correct snapshot booted). I also guess that even merging the snapshots requires space, so deleting (or moving to backup) to vmems can be the solution to allow that.

If I delete (move) the files listed, than it would probably work but my snapshots will be gone and I wont be able to copy all the files back again right?

Yeah that is true. The problem is that you cannot convert or shrink the vmdk as long as snapshots exist, so the issue would reappear as long as you don't move to a larger physical disk.

Phil1988 commented 4 years ago

Yes, it shows that the snapshots are dependent from each other. The problem is, that "deleting" a snapshot from the GUI (which really is a merging) doesnt work if the disc is full :D :D

I already have the snapshots deleted manually because otherwise the "original" system was not booting in VM. Now I also do an ncdu on the real VM and its seems to run (but of corse the snapshots where deleted and data might have been deleted aswell). Meanwhile I make a copy of the backup on the external HDD. (which needs another 7hours)

After finished, I will merge the snapshots on the backup and convert to a growable disc. As far as I understand you think this might solve the issue right?

MichaIng commented 4 years ago

The problem is, that "deleting" a snapshot from the GUI (which really is a merging) doesnt work if the disc is full :D :D

Okay so my last thoughts are verified. Try to move the .vmem files out of the disk first, which are basically swap files for the VM, so do not contain any disk data. That should allow you to merge the snapshots, probably the smaller ones only first and the larger latest one at least.

Phil1988 commented 4 years ago

So ncdu just verified that nothing unexpected is on the "real" VM. grafik

The problem I have now is, that the VM files are really odd... The DietPi_VMware-x86_64-Stretch.vmdk already consumes nearly all the 1.6TB.

But there are the additional (snapshot ?) files like the DietPi_VMware-x86_64-Stretch-000004.vmdk. It is not possible to delete the files and write them back from my backup, as there is not enough space for 1.6TB + 61GB on a 1.6TB drive. I dont know how VMware manages it to place these files on that drive :D

So for now my VM is somehow brocken but I still have my backup and will play around with the copy in a few hours...

MichaIng commented 4 years ago

Oh okay, I though those are all still on the main drive. So all files are now on a second larger drive as well? Isn't it possible to add the VM to WMware from there (it doubles of course but should be possible) and remove/merge the snapshots + compact the image there, then move the compacted VM back to the original disk? EDIT: Ah probably the snapshot image paths are absolute paths instead of relative ones, that would break it. Probably those can be changed in the .vmx file.


About your idea to access the data without booting the VM: As long as the base image and shapshots are together in one place (and the information about which snapshot is attached to which image is stored as relative paths) it should be possible to attach the lastest snapshot 000004.vmdk to a different VM and access the data from there.

Phil1988 commented 4 years ago

Yes, all files are copied to the 2nd/external drive before I made the changes. And for the further tests I copy this exact backup again to be sure that my years of work with this server wont be lost when I do the next steps :D

I am pretty positive that it is possible to add the backuped VM to VMware and merge the snapshots, un-share the VM, convert the disc to a dynamic type and compact it before moving it back to the internal drive. But before I try that I need to make a copy... cause you know :D

Meanwhile, as I already have to use a completely different organized VM in the future: Is it somehow possible to write the nextcloud userdata to a accessable drive when the VM is not running?

I have something in mind like a shared folder on the same drive or another mounted drive only for the userdata. The reason for that is, that I will be able to access my files even when problems like the current one happens with the VM.

It was always reassuring to know if something with the filesystem happens (back when I used the PI) the data will still be accessable.

EDIT: I just saw you answered my last question that I put here :D I know that this is maybe possible to read the VM-files with creating a new VM - but always with some software and maybe some fiddeling around involved.

In the PI it was possible to write these files plain to my external (NTFS) drive. So as soon as the filesystem got corrupted or any problem happend, I was still able to plug the drive to my PC and access all files I had there.

I think I did it with mounting the NTFS Drive and changed the user data location to that mounted drive.

Is anything similar possible with the VM and dietpi?

MichaIng commented 4 years ago

Is it somehow possible to write the nextcloud userdata to a accessable drive when the VM is not running?

Okay first of all, vmdk images can be attached and accessed from other VMs as well, the vmdk can be even converted to img or iso to attach to real machines, but I think that is not what you mean.

Nextcloud has external storage modules to access data at Samba or NFS mounts, but that is all a bid slow and has certain limitations. The best that I can think of currently is to attach a physical USB drive (probably it is possible with a SATA drive as well?) to the VM and move DietPi userdata over there. As long as the VM runs, the drive is inaccessible from the host (no mounted with drive letter) but as fast as the VM shuts down, the drive is accessible from the host. With Windows host you then still have the issue that it cannot access ext4 or btrfs file systems that one would suggest on Linux, NTFS is the only possibility that still allows UNIX file permissions via ntfs-3g driver but at the cost of additional CPU usage.


So basically yes, very similar to what you did with the Pi. Just check the possibilities to attach physical drives to the VM, at least for USB drives this should be possible.

Joulinar commented 4 years ago

but this all will not shrink file size of the VMWare files, isn't it? Even if dietpi user data would be copied to an external HDD, the VMWare files will stay as they are because of fixed size if I'm not mistaken. So what about a radical idea to export dietpi user data + Nextcloud database to an external device, start from scratch with a fresh Buster VM and reload all data afterwards?

MichaIng commented 4 years ago

@Joulinar See my link above: https://www.howtogeek.com/313125/how-to-convert-between-preallocated-and-growable-disks-in-vmware/ So should be possible to switch to dynamic allocation and compact the image size. Reducing the actual file system size and partition and then maximum virtual disk size is probably possible as well, although on VirtualBox it is not.

Phil1988 commented 4 years ago

@Joulinar: Yes thats true, the VM would also "block" the entire physical disk. Thats why MichaIng mentioned to switch to the dynamical growable disk space model. I am not 100% sure but I remember I made the preallocated (map) disc for a reason but I dont know why... maybe I can tell you in soon why I made it that way :D Maybe "shared VMs" cant use that dynamic disc size... but we will see.

Currently all the snapshopts are beeing deleted. This seems to take a while (the last one needs already +30min and is still in the deleting (merging) process... I will report everything after its done so you guys gets as much information as possible. So you and other can get more information to prevent my situtation or do it better from scratch :D

EDIT: Quick question: Should I convert to a single growable virtual disk or growable virtual disk split to eg. 2GB files?

What would you think are the benefits of these two options? Because personally I would tend to prefer one single growable virtual disk.

EDIT 2: I will use the split files, also called "split sparse" as I saw the pro and cons here: http://sanbarrow.com/vmdk/monlithicversussplit.html

MichaIng commented 4 years ago

Great, especially shrinking and merging seems to consume at max a single chunk size then, so this avoids the issue you had when the disk is nearly full already. Jep for large disks this seems then indeed splitting the image is very reasonable, learned something new 👍.

Phil1988 commented 4 years ago

Just a quick interim report:

I did merge all snapshots.

Now I want to shrink it before convertig to "slpit sparse".

As I dont know how this is done the best/fastes way I found this: https://superuser.com/a/1116213

Which is basically: sudo e4defrag / (done already but it needed about 11hours to finish)

dd if=/dev/zero of=wipefile bs=1M; sync; rm wipefile started but aborted after 45min. It does about 10GB/min... so It will take about 2.66h.

Then I tested sudo vmware-toolbox-cmd disk shrinkonly which is hard to estimate but I would say it will take another 4-5hours.

and then I will be able to convert it to the lowest size split sparse.

Is there another but faster way instead of dd if=/dev/zero of=wipefile bs=1M; sync; rm wipefile ?

you @MichaIng said something about zerofree. Is this faster and how do I use this properly? :)

MichaIng commented 4 years ago

Yes first of all you need to fill empty space with zeros, the dd command basically does the same by creating a large file filled with zeros and removing it again. However zerofree is made for that job:

apt install zerofree
dietpi-services stop
mount -o remount,ro / # File system needs to be mounted read-only
zerofree -v /dev/sda1

If you want to reduce the disk size (or partition size) you could do that as well:

resize2fs -M /dev/sda1 # This reduces to minimum possible size without risking any data loss
resize2fs /dev/sda1 <size in blocks> # Then raise to desired size

The first command tells you what the block size is, so you can choose the right integer for the second command, usually block size is 4 KiB.

Then shrink the partition accordingly:

sfdisk --no-reread --no-tell-kernel -fN1 /dev/sda <<< ',<partition size on 512 byte sectors>'

When you have the partition size, e.g. in 4 KiB blocks, translate it into 512 byte sectors (in this case times 8). A bid nasty that those commands print and expect input in specific units instead of allowing to give in in bytes or with k/m/g suffixes for KiB, MiB or GiB 😉.

Phil1988 commented 4 years ago

hmm I get that mount: / is busy

EDIT: looks like there are still some background processes blocking the mount?

EDIT 2: is mount -n -o remount,ro -t ext4 /dev/sda1 / not the better command as is specifies the drive? grafik

MichaIng commented 4 years ago

Ah yes I find that as well often, just now it worked OOTB on my VM. htop might help to check for running services and kill all that are not required, e.g. haveged, time sync (if running as daemon), SSH server if you connect via local console, else agetty if you connect via SSH. fuser can as well be used to check for write-opened files but as well I by times run into this issue while no single file is write-opened, not sure what it blocks then. If it does not work, you can edit /etc/fstab and append ,ro to the root mount options to have it read-only mounted with next reboot.

mount -n -o remount,ro -t ext4 /dev/sda1 / not the better command as is specifies the drive?

Not required since mount otherwise pull the device name/path from /etc/fstab, but it does not hurt.

Phil1988 commented 4 years ago

These were running.... grafik

killed all I can so I had: grafik

now it works... even with a progress message :D
grafik

As you "might" have noticed ( :D :D ) I am not that experienced and good in these linux stuff...

So would you go for the shrinking via VM with sudo vmware-toolbox-cmd disk shrinkonly

Or is it better to

resize2fs -M /dev/sda1 # This reduces to minimum possible size without risking any data los
resize2fs /dev/sda1 <size in blocks> # Then raise to desired size
sfdisk --no-reread --no-tell-kernel -fN1 /dev/sda <<< ',<partition size on 512 byte sectors>'

as you mentioned? I dont know what the difference really is :(

MichaIng commented 4 years ago

I think vmware-toolbox-cmd disk shrinkonly will only compact the discs but not reduce the maximum size? If you want to minimise the risk that such an issue occurs again, I'd reduce the virtual disk size, i.e. maximum data size that the VM can store, so that the vmdk file, even if dynamically allocated, can never fill the whole physical drive alone. I think for this you need to reduce the file system and partition sizes manually. As your vmware-toolbox-cmd command does not contain any size, I guess it only frees up unused space without actually reducing the virtual disk size. The latter needs to be done from the host system, hopefully the VMware GUI has an option for that. But even if not, as long as the VMs partition size is e.g. 1 TiB, the dynamically allocated disk will not grow larger anyway 😉.

Phil1988 commented 4 years ago

I have no idea :D

as resize2fs doesnt work: grafik

I will du the sudo vmware-toolbox-cmd disk shrinkonly meanwhile and report back, what it does :D

You said sfdisk will shrink it accordingly. According to what? Can I use this command also without using resize2fs ?

EDIT: That is interessting.

yesterday it was possible to start the "shrinkonly" command.

Today I get this at the exact same VM: grafik

EDIT 2: I am now converting this machine to dynamic size. Normally I had in mind to shrink it first to have a smaller sized dynamic VM, but all the tools like "Clean Up Disk", "compact" and now also the "shrinkonly" command seems to work only with the dynamic mode.

Its still a mystery why "shrinkonly" worked yesterday. Ony difference was, that I shortly started the dd if=/dev/zero of=wipefile bs=1M; sync; rm wipefile command

After that I stopped/aborted it, and "shrinkonly" worked.

After that I only installed zerofree, killed all the services and started zerofree. no other things were done.

VM gives you always a surprise :D

MichaIng commented 4 years ago

Normally I had in mind to shrink it first to have a smaller sized dynamic VM, but all the tools like "Clean Up Disk", "compact" and now also the "shrinkonly" command seems to work only with the dynamic mode.

Yes that makes totally sense. As I expected, it is only so compact dynamic allocated disks if there is free space filled with zeros, it is not to shrink the actual disk (max size).

You said sfdisk will shrink it accordingly. According to what? Can I use this command also without using resize2fs ?

Never do that, you could loose data, at best create a situation where something is trying to write to parts of the file system that are outside of the partition. resize2fs is not only here to reduce the file system size, but also to get the smallest size that is possible without loosing data, i.e. it shrinks the file system from the end until the first block of data appears, moreless. Only with this you know how much you can shrink it, and the partition accordingly. File system size must never be larger than partition size, same like partition size must never be larger than disk size 😉.

As shrinkonly worked yesterday, what did it actually do, if the disk was not yet dynamically allocated?

Phil1988 commented 4 years ago

As shrinkonly worked yesterday, what did it actually do, if the disk was not yet dynamically allocated?

Nothing special. I just took the copied backup (1:1 copy if my 1,6TB disk) and merged the snapshots via VM GUI, then did this: https://github.com/MichaIng/DietPi/issues/3755#issuecomment-688999084

EDIT: The converting to dynamic size has just finished. Looks like its alreads converted the the smallest size: grafik

MichaIng commented 4 years ago

Strange, would be still good to know if my assumption is correct that it compacts dynamically allocated images (but then the question why it was able to progress once without having the image converted to dynamic allocation first) or if it as well somehow attempts to reduce file system and/or partition and/or max disk size.


EDIT: Nice, probably you can now run vmware-toolbox-cmd disk shrinkonly again from within the VM, just to see if it works now that the image is dynamically allocated.

Phil1988 commented 4 years ago

I will do so, but I first do a quick check if the VM is runnable and check the size wich ncdu.. this might take a few minutes.

Anything else you are interessted to see?

EDIT: Well this is strange: grafik

The whole VM is 416GB in Windows grafik

What is a wipefile? grafik :D

MichaIng commented 4 years ago

Basically when vmware-toolbox-cmd disk shrinkonly ran through, would be interesting to see:

df -h
fdisk -l

to see file system and partition sizes.

Phil1988 commented 4 years ago

grafik

grafik

so it looks like the partition stayed the same at 1,6TB

But its still weird that the VM is 416GB is size while df and ncdu report back a size of about 818GB.

An what is that wipefile for? :'D

I also dont understand where and how to "clean up disk". This shows where to find it: https://www.howtogeek.com/313102/how-to-shrink-a-vmware-virtual-machine-and-free-up-disk-space/

My machine is shut down but I dont have that in my list: grafik

So..... haha.. I have no idea what to do next :D

In the settings I see more digits about the size grafik

I already hit "compact"... which didnt do anything. Maybe I have to "defrag" it from this menue again (as I already did it before converting it to dynamic I thought this is already done...) ?

EDIT: According to this: https://docs.vmware.com/en/VMware-Workstation-Pro/15.0/com.vmware.ws.using.doc/GUID-421A1073-BF16-4BC7-AA76-46B954CA438D.html

"Clean Up Disks" only exists for NTFS disks. Also "Defragment" finishes in 2 sec... so it seems to be defragmented already.

Will google now for that wipefile and where it came from :)

EDIT2: The wipefile was created, when I did: dd if=/dev/zero of=wipefile bs=1M; sync; rm wipefile for about 45min. I did this after sudo e4defrag /

Can I remove/delete this file?

EDIT3: for me it looks like dd if=/dev/zero of=wipefile bs=1M; sync; rm wipefile and sudo vmware-toolbox-cmd disk shrinkonly belong together

will I get any benefit from doing these now? I think it is alreadys pretty compact as visible size from the host. The guest sais something else, but this can be almost the wipefile file :)

Next questions that I have in mind is, that the disk size (partition) is 1,6TB now. Will VM expand that by its own if I ever go beyond that size?

If so, I could already shrink it down to like 800GB right?

MichaIng commented 4 years ago

No idea what "clean up" means in this regards. Of course VMware cannot know which files are junk and which not and should never attempt to guess this, so this can only be some meta data.

The disk data size vs image size is most likely due to sparse files, so files with empty parts, which add to the file size usage but can be compacted/compressed effectively. For the same reason zerofree allows better compacting, as it fills empty space with zeros, as WMware is not able to compact even empty space, as long as its still filled with deleted data (the data is still there, only the link to the file system is removed on deletion, so it can be overwritten whenever a new write requires space).

The dd wipefile command has exactly the same purpose as zerofree. When not given any count argument and with an unlimited input (/dev/zero), dd writes zeros to this file until the disk is full. When you now remove the file, practically all free space is filled with zeros.

Now I think the reason why you have so high disk usage is that you aborted a dd command before completion? In this case the wipefile is not removed, hence it is still there, consuming drive space. Probably located in user home dir?

rm ~/wipefile

else find it via:

find / -name wipefile

Afterwards df and image size should be again close.


e4defrag has not really any reasonable effect on ext4 file systems, just skip that.


As of your edits, yes remove the wipefile, see the commands you used, which contain ; rm wipefile to remove it after it has been fully written, it is not required and not intended to stay 😉.


VMware will raised the image size up to 1.6 TiB again, yes, as long as the partition has that size. Hence the resize2fs + sfdisk. As you tried it, I forgot that resize2fs cannot be used to shrink file systems of any mounted drive (even read-only), so this can only be done by attaching the virtual disk to a second VM and do those steps from there. It is required to first reduce the file system size via resize2fs and only afterwards the partition via sfdisk to a size equal or larger than the file system.

Phil1988 commented 4 years ago

I already know where it is, but didn know it purpose... I deleted it grafik

And it does look better now: grafik

Will Vm only raise the image to a max size of 1,6TiB an then it is full? It wont expand it by itself?

I guess this is good as reducing the size is somehow uncomfortable :D

Do you see any benefit from down sizing it? of not, I would leave it as it is, because 1,6TB is ok for a server running NC

EDIT: Update from DietPi v6.31.2 to v6.32.2 was successfull 👍

EDIT2: well I guess I have the answer for my "Do you see any benefit from down sizing it?" It would be smart to size it down to like 1,55TB. So there will always be 50GB left on that drive and I can delete snapshots or something like that if this problems ever happens again...

MichaIng commented 4 years ago

VMware will never raise the disk size by itself, only the resulting image file size until a max of 1.6 TiB yes. The idea of reducing the size was to assure that the image file cannot fill the 1.6 TiB physical host drive alone again, as then there is not space for vram, images, logs anymore and you run into the same issue. However, as long as you don't plan to fill those 1.6 TiB with actual data, VMware has no need to raise the image size.

Btw this vram swapping can be disabled: https://superuser.com/questions/1480147 This saves number of snapshots times RAM size of disk space and writes. Memory swapping, if required, should be done within the VM itself. This breaks suspend to disk for the VM, if I understood right, but I see no point for that anyway on a VM 😉. While the above applies to all VMs system-wide, this can be added as well to the single VMs vmx file: https://windowsloop.com/fix-vmware-vmem-high-disk-usage/

Phil1988 commented 4 years ago

Gosh... I really had a hard time the last ~5hours.

I did these steps at my un-knowledge to hopefully reduce the size of the disk:

  1. I did run vmware-vdiskmanager.exe -k because it was said it will actually reduce the disk size. 09 The problem was that the writers of these information were not precise enough because it did only reduce the file size.

  2. I did download VMware vCenter Converter because it was descriped as beeing able to reduce the disk size. I watched YT-Videos and figured out that it doesnt work on my machine.

  3. I tried to "connect" 2 different VMs as descriped here: https://www.vmware.com/support/ws55/doc/ws_devices_serial_2vms.html But that does not work.

  4. I tried it without google with my knowledge. As I dont have any clean VMs here I donwloaded the latest dietpi VM "Buster". Sadly my Stretch "VM disk" cant be connected to that VM: grafik

5: I took my garbage Stretch VM (the backup copy from this problemativ VM here) and attached the now cleaned, zerofreed, converted and shrinked VM to it: grafik

I can also see this disk in the VM: grafik

I will now try your steps descriped above with sdb1... it either is blocked because its the wrong partition or it should work as your tipps are mostly bullet proof and do work :)

EDIT1: ok sda1 is the right one: grafik

As this takes a while now, I will go to sleep and continue with exact these commands: resize2fs /dev/sda1 218547278 I hope I got that integer right.

I currently dont know how to use this sfdisk --no-reread --no-tell-kernel -fN1 /dev/sda <<< ',<partition size on 512 byte sectors>' The documentation sais nothing about the -fN1 also is the /dev/sda correct or should it be /dev/sda1?

and the following commands are completely "from outta space" :D <<< ',<partition size on 512 byte sectors>' ?? I think I should replace the "<partition size on 512 byte sectors>" to... some integer... as it is in 512 byte sectors and my file system is now 4K should I use 218547278*8 = "1748378224" ?

so I would think it would be something like this: sfdisk --no-reread --no-tell-kernel -fN1 /dev/sda <<< ',1748378224' is that anywhere near correct? ;)

Phil1988 commented 4 years ago

Well resize2fs /dev/sda1 218547278 does nothing: grafik Is that OK?

And should I now do this? sfdisk --no-reread --no-tell-kernel -fN1 /dev/sda <<< ',1748378224'

MichaIng commented 4 years ago

resize2fs -M /dev/sda1 reduced your file system size successfully to 834 GiB (there is no need to repeat the command with this size), which based on location of the data on the disk image seems to be the smallest safe size currently without moving all data to the start of the disk (possible via GUI tool gparted).

E.g. you could increase it now to 1 TiB to have a nice round size and the partition accordingly, which should leave enough free space on the physical disk:

resize2fs /dev/sda1 268435456 # 1 TiB in 4k blocks
sfdisk --no-reread --no-tell-kernel -fN1 /dev/sda <<< ',2147483648' # 1 TiB in 512 byte sectors
fsck -f /dev/sda1 # Failsafe

So resize2fs -M does reduce the file system size to a minimum, but most importantly this tells you what this minimum is. Would have been bad if data were stored on lets say 1.4 TiB from disk start and we reduce the file system size to 1 TiB, then all data from that bits on would have been lost.


vmware-vdiskmanager.exe -k AFAIK does exactly the same as compacting from the GUI.


Ohh, the incompatibility error is an issue, actually I just reduced the hardware revision of our latest VMware image to make it compatible with older VMware/ESXi versions: https://github.com/MichaIng/DietPi/issues/3637#issuecomment-667351069 If I understand it right, the "old" Stretch image is still newer than the VMware revision, so not your image is too old but our image. Luckily all that is only a single setting in the vmx file: You could manually edit it and set virtualhw.version = "16".

Phil1988 commented 4 years ago

now grafik should I let it repair?

MichaIng commented 4 years ago

There was a missing single quite ' before but you found that already, right? Yes "repair" the inode size, it is more an optimisation, if I understand right.