QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
534 stars 46 forks source link

Builder: make -jN and resume #5564

Open mfp20 opened 4 years ago

mfp20 commented 4 years ago

The problem you're addressing (if any) The building process isn't using all available computational resources (cores). The result is a veeeery slow build process. In addition: in case of problems (ex: no space left on device), there's no resume, so the process have to be restarted from the beginning, ie: in case of problems, the build process is endless, even more in the case we are building multiple TemplateVMs (minimals, arch, centos, kali...).

Describe the solution you'd like

Where is the value to a user, and who might that user be? Anyone building Qubes or any TemplateVMs.

Describe alternatives you've considered Adopting existing buildchains (ex. Buildroot).

marmarek commented 4 years ago

Individual components already do take advantage of multiple cores (all available) - this is mostly visible on larger components like kernel, xen or libvirt. As for resuming the full build, it is possible but inconvenient: see at which component it failed, then instead of make qubes, execute make <explicit list of components starting from that failed one> . You can get full list of components from make help - already in form for easy copy&paste. For the case of full build, this indeed could be improved.

Note, the primary use case for qubes builder is building individual packages, not everything at once (like almost every other distribution), so things like buildroot (where the primary output is the whole image, not individual packages) do not fit here.

fepitre commented 4 years ago

I started working on the resume problem. First try was to put a flag file with "build succeeded" related to component name. I could arrange things and show this implementation.

mfp20 commented 4 years ago

@marmarek

Individual components already do take advantage of multiple cores (all available) - this is mostly visible on larger components like kernel, xen or libvirt.

I have to admit that don't feel comfortable with Qubes yet, as I can't figure out too many things yet. I switched to QubesOS yesterday only. One of those things I don't own (yet) is monitoring cpu utilization. I'm currently monitoring the cpu using top/htop/nmon on dom0, and I can't see much cpu use in there. I suspect I'm doing it wrong. Last time I've used Xen was 15 years ago... In any case, looking there, the cpu is mostly unused; all the cores have never been used during the build process. And gcc6.4 build times are huge; before posting this feature request I tried to build gcc on gentoo (same HW) and it was about 10 times faster. Are you sure that all the 8 cores (16 threads) are in use during the build? How can I be sure of that? What am I missing?

As for resuming the full build, it is possible but inconvenient: see at which component it failed, then instead of make qubes, execute make <explicit list of components starting from that failed one> . You can get full list of components from make help - already in form for easy copy&paste. For the case of full build, this indeed could be improved.

Are you sure that single components are packaged (in rpm, iso, ...) one by one (build and package first component, build and package second component, and so on), instead of building everything and then packaging everything (build 1, build 2, build 3, ..., package 1, package 2, ...)? In the first case if the second component fails, I should find the first one ready in place; and I've found nothing.

Note, the primary use case for qubes builder is building individual packages, not everything at once (like almost every other distribution), so things like buildroot (where the primary output is the whole image, not individual packages) do not fit here.

It isn't completely true. Have a look at OpenWrt buildroot. You can choose packages and build them separately. It's a modified version of Buildroot-ng, but it works for partial builds as well. It also outputs 2 additional tools: an SDK (for building packages only), and an ImageBuilder (for building an image out of prebuilt kernel+root+packages). Both tools are a slimmed down and modified version of the full fledged openwrt-buildroot that generated those 2 tools. Mine is just an advice to give a second look at buildroot. I can see the quality of your actual build system (as everything else you published, I'm amazed: rock solid stuff, high quality stuff you made... guys.. gratz), but buildroot would probably give you more automation than what you already produced in home. I mean, there's an initial work to adopt it, but after the initial work it would probably give you faster results. As long as you keep going with makefiles...

Thanks for the quick response.

@fepitre

I started working on the resume problem. First try was to put a flag file with "build succeeded" related to component name. I could arrange things and show this implementation.

Awesome! That is more than what I was asking for. I mean, it's The Right Thing, but takes more time. Thank you!

marmarek commented 4 years ago

One of those things I don't own (yet) is monitoring cpu utilization. I'm currently monitoring the cpu using top/htop/nmon on dom0,

Dom0 sees only own processes, not other VMs. You can use xentop for better picture (or that Q widget - "domains widget").

marmarek commented 4 years ago

As for the build system, we build multiple packages for different distributions (Fedora, Debian, Arch, CentOS, ...) and prefer to use their native build systems. This means we use rpmbuild/mock for rpms, dpkg-buildpackage/pbuilder for debs etc. This way, produced packages can use all the features of a given distribution. In fact, qubes-builder do not built anything itself nowadays (it did in its early days), only call out to other build tools. The majority of its work is about managing all those 100+ repositories. And what's very important (and what surprisingly many tools lack) - verify integrity of source code.

Of course there are alternatives. For example we could use repo tool for maintaining multiple repositories. If the integrity assurance would be maintained (I have rather poor opinion about this particular tool), it could be better tool.

Anyway, qubes-builder currently works for us, so without any other bigger change it's unlikely we'd migrate. One such change in coming years could be shipping (much smaller) dom0 as a whole image, not individual packages. But that's far ahead and not yet decided if we'd go that way.

mfp20 commented 4 years ago

@marmarek

Thanks for the hints. And yes, I agree with you. I've been fiddling with Builder all the day long and got a better view of the whole thing. It looks good as is. Of course, there's space for improvement, but the overall architecture is good for the purpose.

At first glance using xentop ... I don't see the cpu much busy. I'm pretty sure I've to study Qubes a bit more, so let me play a bit more before writing again about the 2 on topic requested features (-jN and resume). I'm waiting for the last builder run to end, so I can understand a few things.

fepitre commented 4 years ago

@marmarek, @mfp20 : here is a simple first implementation https://github.com/fepitre/qubes-builder/commit/84a186d949638ec6833f6ca49a3bb50b437f4be4

mfp20 commented 4 years ago

@fepitre

Thanks! I'll patch my local builder to give it a go. But I'm not experienced enough to assert quality of your job. I hope it's good enough for going into master.

In the while time yesterday I realized that successfully built templates are added to the existing ones, so there's no automatic clean of the bins (rpm) dir. Other components are protected by your patch, from deletion.

I also still don't know whether the downloaded sources are cleaned or not, but I'm short on disk real estate currently, and need to adapt my hw before being able to experiment more with Qubes. I'm currently busy in moving from gentoo to Qubes (cheers!). I dropped redhat&sons when they've gone commercial 20+ years ago, so I'm not comfortable with rpm/yum/dnf details, but... if I AND the fedora approach ("clean downloads after next successful install") and qubes approach ("don't build from sources, rely on distros build tools") the downloaded content might be cleaned too often. Once this detail is investigated properly, I'd say you solved the "resume" request.

@marmarek

About the -jN request instead. I can't say much yet. The system is a bit sloppy/clumsy. I can't really understand what the problem is / if a problem it is. Xentop reports the use of vcpus, not the use of the real cores. And the polling speed is too slow to smell the situation by looking at xentop output; wearing the pants of an hunting dog... (ie: using the nose)... i'd say the VMs are spending too much time in "b" state, instead of being in "r" state... but I need some better tools before stressing you more about this. The fact that I refuse to accept your assurances about all the cores being used is because the gcc build times are ... simply out of scale when build in a builder cube, when compared to gentoo. And I see a 20% maximum cpu use in the qubes tray applet, during gcc build: I see the log scrolling of gcc building gcc and the cpu is at 20%... it isn't the correct behavior. A similar problem applies to RAM; I currently have 32GB on this system and xentop reports them as mostly used, but the actual running tasks don't require 32GB of RAM in use. If I remember well Xen uses all the memory in order to manage it on VMs demand, so that might be ok. Any advice about tools to use to benchmark?

I also tried to test the block devices but ... FIO isn't available on dom0 'cause has been included in Fedora from f30 and currently dom0 is at f25. Without a specialized tool I've been able to do some basic tests only, using hdparm and dd. Using those on dom0, and FIO on the VMs, I'd say the block devices are kind of fine; maybe a 10-20% less than bare metal, but it's fine as they are mostly nvme drives, so the 80% (1500mb/s+ avg) is more than enough for my needs. I've moved Qubes 4.0 into "production", and I'm currently building 4.1 to be place on my test disk. Maybe having dom0 on f30 (ie: having FIO on dom0) can shed some light.

fepitre commented 4 years ago

@fepitre

Thanks! I'll patch my local builder to give it a go. But I'm not experienced enough to assert quality of your job. I hope it's good enough for going into master.

This first simple implementation is: in case of build success, I'm adding a file in chroot named $(COMPONENT).built, and checking if it exists on a next build for the related component. I've not tested more than simple build and retry.

mfp20 commented 4 years ago

@fepitre Thanks! I'll patch my local builder to give it a go. But I'm not experienced enough to assert quality of your job. I hope it's good enough for going into master.

This first simple implementation is: in case of build success, I'm adding a file in chroot named $(COMPONENT).built, and checking if it exists on a next build for the related component. I've not tested more than simple build and retry.

Remember to remove that file on make clean(s)...

fepitre commented 4 years ago

@mfp20: here https://github.com/QubesOS/qubes-builder/pull/98 you can track the work which will be done soon in implementing the resume feature in a more practical way.