Dietpi-Imager xz threads on Virtual Box

JCalvi commented 1 month ago

Dietpi 9.7.1 Running on latest VirtualBox on Windows 11 with latest extensions.

Dietpi-imager works great except for the xz compression which only uses 1 cpu. If I force the threads in the script to -T2 or -T4 etc then the appropriate number of cpu's is used.

The cpu command in the guest returns...

[WARNING] Most CPU info is not available on virtual machines. Architecture | x86_64 Temperature | N/A Governor | N/A

htop shows 4 cpu's and as mentioned all are used if local threads=0 is changed to local threads=4 in dietpi-imager script.

Is this expected behaviour? Do guest extensions need to be installed to get xz to detect the correct threads for -T0 setting?

Otherwise great script, works really well to create small and flexible images.

MichaIng commented 1 month ago

I just ran some tests with xz:

root@VM-Bookworm:~# fallocate -l 1G test
root@VM-Bookworm:~# xz -T0 test
xz: Reduced the number of threads from 4 to 2 to not exceed the memory usage limit of 492 MiB
root@VM-Bookworm:~# xz -d test.xz
root@VM-Bookworm:~# xz -T4 test

This was with 2 GiB memory allocated to the VM. After raising it to 4 GiB, -T0 used 4 threads and no such error message.

Checking the docs, there is indeed a default limit for multi-threaded decompression, and compression with -T0, which seems to be 25% of the physical RAM size:

root@VM-Bookworm:~# xz --info-memory
Hardware information:
  Amount of physical memory (RAM):  3913 MiB (4102864896 B)
  Number of processor threads:      4

Memory usage limits:
  Compression:                      Disabled
  Decompression:                    Disabled
  Multi-threaded decompression:     979 MiB (1025716224 B)
  Default for -T0:                  979 MiB (1025716224 B)

And with 2 GiB indeed it is 492 MiB:

root@VM-Bookworm:~# xz --info-memory
Hardware information:
  Amount of physical memory (RAM):  1967 MiB (2062090240 B)
  Number of processor threads:      4

Memory usage limits:
  Compression:                      Disabled
  Decompression:                    Disabled
  Multi-threaded decompression:     492 MiB (515522560 B)
  Default for -T0:                  492 MiB (515522560 B)

This can be raised, e.g. -T0 -M75% to allow up to 75% physical RAM size before lowering the number of threads. Not sure which one is reasonable? 50% minimum should be perfectly fine.

MichaIng commented 1 month ago

We actually limited it ourselves in case of higher CPU core number and lower RAM, but on a much higher level than what xz does by itself. This was needed when we used p7zip and 7zip archives before, which ran into OOM on systems with lower RAM but multiple cores, like an RPi 2 with 4 cores but only 1 GiB RAM. But since xz does this automatically, even much stricter, I removed our obsolete handling and raised the limit of xz to 50%: https://github.com/MichaIng/DietPi/commit/3516a6f

Let me know if you think an even higher limit would be fine.

JCalvi commented 1 month ago

I would say an even higher limit would be fine myself. With 4096mb and -T4 set manually there were no issues running imager. I had before posting tried the VM with 8192mb ram and still got only 1 cpu at the xz stage though. I will test again with even more ram to confirm that -T0 does start using threads.

MichaIng commented 1 month ago

Remember that you need to take the dev branch version of the imager script, until DietPi v9.8 has been released.

Indeed I did not test yet how much memory the compression of a typical image even requires. When you test, please also check whether -T4 starts to use swap space with 4 GB RAM. Since in that case, it might be even faster to use 2 threads only without swapping.

JCalvi commented 1 month ago

Just tested dev branch with 8192mb on VirtualBox. Got 3 threads utilised, no swap file and plenty of memory left. I can easily up the VM to more ram but i think you could afford to be less conservative.

G_EXEC_DESC='Creating final xz archive' G_EXEC xz -9e -T0 -M50% -k "$OUTPUT_IMG_NAME.$OUTPUT_IMG_EXT" 50% Screenshot 2024-10-10 090006

G_EXEC_DESC='Creating final xz archive' G_EXEC xz -9e -T0 -M75% -k "$OUTPUT_IMG_NAME.$OUTPUT_IMG_EXT" Screenshot 2024-10-10 091341

MichaIng commented 1 month ago

Was the RAM usage still raising in both cases? Because at time of screenshot, it was not even close to the 50%/75% yet. I'll raise it however to 75%.

JCalvi commented 1 month ago

It had peaked out in both cases. It seems to be conservative in its definition of percentage as well.

I added 2 more cpu's and it still only uses 4 at 75% even though only 50% of the ram is actually committed. So 75% is still a quite safe and conservative setting.

Here is one at -T95% and 6 cpus allocated with 8192mb ram. Screenshot 2024-10-10 094036

75% should be super safe for all users. The only issue I could foresee is if much larger images need more ram (I've only tested up to 10gb ones). I suspect xz drops cpu's if the ram limit is approached anyway.

MichaIng commented 1 month ago

Yeah, seems fine then. And yes, the larger the image size, the higher the RAM usage, and it would:

drop threads
switch to single-threaded mode (which differs from 1 thread in multi-threaded mode)
reduce dictionary size

in that order/priority to meet the 75% limit.

JCalvi commented 1 month ago

Thanks Michalng,

great work as always.

MichaIng / DietPi

Dietpi-Imager xz threads on Virtual Box #7234