eReuse / workbench

The eReuse.org Workbench is a toolset to assist with the diagnostic, benchmarking, inventory and installation of computers.
https://www.ereuse.org
GNU Affero General Public License v3.0
13 stars 7 forks source link

Cache served files in server VM #62

Open ivilata opened 7 years ago

ivilata commented 7 years ago

As explained in #55, files like installation images and ISOs will be served by the virtual PXE server from a VirtualBox shared folder. Unfortunately, these are very underperformant (see this for example), which has proved to make parallel installation of several PCs very slow.

One way to avoid this would be to cache served files into the virtual server itself, whether it is on disk or in RAM. Obviously, this requires dedicating more resources to the VM, but it may degrade gracefully (e.g. by not caching) if these resources are insufficient.

This depends on #55.

ivilata commented 7 years ago

The ereuse-data-refresh script is a good place to set this up.

Since ISO files in the ereuse-data/images are mounted as a loop device and then exported via NFS, it may be that they already get cached, but this should be checked. Sadly this may not be the case for FSA files, which are served directly by Samba, so first one should check whether they get cached. If so, just using a VM RAM big enough to host both the ISO and the FSA should be enough. Otherwise, special trickery like tmpfs and such will be needed.

ivilata commented 7 years ago

Ok, I performed some tests on a host with 8 GiB RAM and a file of 972 MiB in the shared folder. Some results of booting the server and the client, then loading the file in the host's cache, then timing the copy in the client of the remote file to /dev/null via Samba:

In this second setup, I create a tmpfs with 2 GiB, copy the file there (i.e. to RAM) and bind mount it over the file in the shared folder. Then the copy in the client drops to 22.2s.

It's quite clear to me that the VBox shared folder file system doesn't cache blocks at all, in fact one can see the shared folder activity icon in the server show reads even after copying the file to the server's /dev/null. Then I guess the CPU becomes the bottleneck.

In summary: we need the trickery, either copying into a RAM tmpfs or copy to some temporary directory (we will need to enlarge the disk for that, e.g. to 8 GiB, but that won't make the OVA larger). The first option is faster but allocating so much RAM for the VM may not be possible in many computers, and the second may be slower.

ivilata commented 7 years ago

Note that 22.2 * 2 = 44.4, so there is probably one copy with the file in the tmpfs and two in the other cases.

Also, it may be that each process which reads the file copies it into the VM, so the number of copies from the host grows with concurrent copies/installations!:(

ivilata commented 7 years ago

I tried some tricks to avoid copying, namely putting some filesystem in the middle just in case it gets cached:

It looks like VBoxSF only supports very basic operations, so copying files to cache out of it will be probably needed in the end.