MichaIng / DietPi

Lightweight justice for your single-board computer!
https://dietpi.com/
GNU General Public License v2.0
4.89k stars 498 forks source link

software out of control -> full disk on a VM machine #3755

Closed Phil1988 closed 4 years ago

Phil1988 commented 4 years ago

Required Information

DietPi version | cat /boot/dietpi/.version G_DIETPI_VERSION_CORE=6 G_DIETPI_VERSION_SUB=31 G_DIETPI_VERSION_RC=2 G_GITBRANCH='master' G_GITOWNER='MichaIng'

Distro version | echo $G_DISTRO_NAME or cat /etc/debian_version 9.12

Kernel version | uname -a Linux DietPi 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

SBC model | echo $G_HW_MODEL_NAME or (EG: RPi3) Virtual Machine (x86_64)

Additional Information (if applicable)

I am not sure if the information above is 100% true because I cant start the VM any more.

My problem is, that I currently cant start the VM with all my servers are installed (nextcloud, blynk etc...). VMware gives an error, that "The files system where disk .... resides is full".

I have a 1,6TB HDD where nothing else but the VM is saved. You can find some information here: grafik As far as I remember the virtual machine is "maped" to the the local volume.

I had the problem about a year ago. Back then it was the trashbin of nextcloud that spammed the disk completely full. I could solve this by copying the whole virtual machine to an external drive, start the vm from there and delete the trashbin via sudo -u www-data php /var/www/nextcloud/occ trashbin:cleanup --all-users.

Last week it happened again. As I dont have an big enough external drive left I had to delete some snapshots, could start the VM, put nextcloud to maintenance mode and cleaned the trashbin. I did run ncdu to check the current size and it showed about 360GB. I know it would be better to run ncdu before I empty the trashbin to get to know what caused this issue, but the system got full so fast, that my first attempt to run ncdu aborted because the disk got full before finishing. So I deleted 2 more snapshots, started the VM and immediatly run the thashbin cleanup command. It worked 2 more days until the problem appeared again.

Sadly I am unable to delete more snapshots. VM tells me "A general system error occured: vim.fault.GenericVmConfigFault." Even if I could and solve the problem, it seems to come back in the near future.

To summarize this: I am guessing that nextcloud is the problem here. I am not sure, as I am unable to start the system and check it for sure.

If so, there seems to be a problem on nextcloud because it normally should not use this much space for the trashbin. Maybe its also sourced by the pictures preview generation. I have a cron installed that runs: */15 * * * * php -f /var/www/nextcloud/occ preview:pre-generate

As my knowledge is limited and the problem might come from a complete different direction I hope to get some help here to prevent this in the future.

Normally the system should be able to destroy itself :D

A couple of years ago I used a raspberry pi without dietpi but with nextcloud and linked the user data storage to a ntfs external drive. This was a great benefit as I was always able to check the files and had access to it - even if the filesystem/server had an issue. It would be gread if something similar is possible with a VM and dietpi.

Any help and ideas are highly appreciated

Phil1988 commented 4 years ago

There was a missing single quite ' before but you found that already, right?

Yes I found it :)

Yes "repair" the inode size, it is more an optimisation, if I understand right.

Ok, I did so.

But something went wrong I guess. It completed on the other VM... 17

Then I shut that down and started my new dynamic VM and did check the size with df -h: grafik So it resized the wrong VM disk.

I did check the "wrong" (backup) VM now and it shows this: grafik So I resized the wrong VM as the right disk was "blocked" and not resizeable.

Why was it not possible for me to do resize2fs -M /dev/sdb1 as I wrote 3 messages ago at my EDIT1 ?

More imporantly: It seems that this "wrong" (backup from the preallocated) VM boots from the "right" (dynamic) VM disk or at least blocks it. Why that and what Can I do to prevent it? I already tried to leave the "right" VM on and start the backup, so that the disk of my dynamic VM is already blocked and "force" the VM to boot from the "wrong" disk... So that I after that can shut down the "right" Vm to unblock the /dev/sdb1 and resize the dammit right disk :D

Looks like I then have to do the same stuff again if I get it "unblocked"...

MichaIng commented 4 years ago

Why don't you just use the backup now and keep the original VM as backup (or remove it if the other one works fine)? You can easily add and remove disks from the VM, you can as well create a new VM (as of hardware revision issue before) and attach the correct 1 TiB disk there, as the disk image is all that counts, the virtual "machine" itself does not consists of more as of the few settings in the vmx file.

Phil1988 commented 4 years ago

Ah yeah... my fault on inprecise explaination: The Backup is from the "original" preallocated VM.

It is not a backup from the already converted (to dynamic) VM.

So it was definetly the wrong disk that got resized... As the copy (backup) of this dynamic disk needs about 3 hours and stresses my HDD even more I wanted to prevent to make additional backups.

Im still wondering what causes this blocking here.

My only idea now is to backup the RIGHT VM and do the resize again... one of these 2 should then be resized and will be my new VM. Then I will delete the ~4TB wrong backups and finally have it finished.

But is there a shorter way instead of doing a 3h copy again?

MichaIng commented 4 years ago

I guess the VM booted from the disk /dev/sdb that you actually wanted to shrink, so /dev/sda was not the running system and hence could be resized. Not sure how to tell VMware which disk to boot when two are attached. The order (bus number) was obviously correct, i.e. the disk that you wanted to resize was the second => /dev/sdb. I as well ran into this issue once with VirtualBox was as well didn't find any way to force boot from a specific disk, but that only happened once and never again.

Yeah, redo the resizing with the correct disk, as this should cause nearly no disk writes (compared to zerofree + conversion).

Phil1988 commented 4 years ago

I dont know how I should do the resizing with the correct disk that you mentioned. I just have that "old" preallocated, resized and blocking sdb1 VM and the new dynamic unresized one.

But I found something that seems to work and might be helpful for you in the future.

Hitting F2 on booting opens the bios. There I was able to change from this 18 to that 19

et voilà: grafik

MichaIng commented 4 years ago

Wow, VMware has a BIOS? Or did you enable/install this somehow? Didn't know that such exists 😄.

Phil1988 commented 4 years ago

Yes it already has it :)

And I was able to resize it to 1.4TiB (gives me enough space for snapshots, vram etc..) grafik

there still 2 small things:

  1. The size of the VM now increased from ~408GB to ~590GB. I gues the change of the partition and filesystem wrote some data to the zeros. The dynamic disks do also have different sizes: grafik The first 8 files looks similar but then they get a bit fragmented. Its also confusing why they are about 50GB each... shouldnt they be 2GB? :D The only way to change this (reduce size on my HDD) is with zerofree again right? :D

  2. The Maximum disk size in VM is still on 1.6TB. Is there a way to change it and will there be some problems in the future because of the non matching size in the VM GUI 1.6TB and the "real" partition of 1.4TiB ? grafik

I already tried vmware-vdiskmanager.exe -s 1539GB "E:\Shared_Virtual_Machines\Dynamic\DietPi_VMware-x86_64-Stretch.vmdk" from the host CMD but it didnt work because a major option was missing (and I dont know if I should pick one and which) grafik

MichaIng commented 4 years ago

Yes reducing the file sizes requires zerofree + compacting through the VMware GUI or vmware-vdiskmanager.exe -k, although not sure how to run the latter on multiple file? 😄

The (max) disk size is still the same, at least on VirtualBox it is not possible to reduce it, not sure if it is on VMware. But it doesn't matter. As long as you don't increase partition and file system size again, nothing gets written to those appending 200 GiB and hence the file sizes cannot grow larger than 1.4 TiB now.

Phil1988 commented 4 years ago

I just wanted to confirm that everything works great again ;)

It was quiet a looong way for me, but with your great support I was able to get this solved. The VM is now 591GB in splitted files and it does run as shared VM (with autostart). DietPi update worked without issues and the Nextcloud update from 18.0.5 -> 18.0.9->19.0.3 worked also. I just hat to apt install -y php7.3-bcmath php7.3-gmp and execute some database commands (that NC told me in the GUI).

I really love DietPi - also no Pi involved it does run really good ;) Its one if the best software I ever used and definitly the best on linux for me.

I will close it now.

MichaIng commented 4 years ago

Many thanks for the feedback, great that everything works now and we learned some VMware internals along the way 🙂.

apt install -y php7.3-bcmath php7.3-gmp and execute some database commands (that NC told me in the GUI).

The two database commands are reasonable yes, the two PHP modules are actually only required for public key authentication via WebAuthn, so login without password. I was already asking to print the PHP module warning only when WebAuthn is enabled, probably I should open a new issue on NC repo about this: #22849