Closed ihoro closed 2 months ago
I've prepared the simplest conditional build to fix Kyua build for non-FreeBSD: https://github.com/ihoro/kyua/commit/8b50c22f196a93ad1a69b1709cb77b5fdc31534e. Let me know if we would like to have a separate freebsd/
src dir for FreeBSD specifics.
It could be the subsequent PR after this one is merged.
I've tested it with FreeBSD 14 RELEASE and Fedora Server 39.
There is the second candidate for the next PR: https://github.com/ihoro/kyua/commit/4dffd2650f9bc59d77af32275d8ab73573443322.
It's a bigger patch, but it collects FreeBSD specifics into a separate freebsd/
src dir.
And there is another commit in the queue to become a PR: https://github.com/ihoro/kyua/commit/0d319defa0bf19460a1baa1be06edd0275833f12. It gets Kyua tests back to the pre-patch state:
Test cases: 1472 total, 22 skipped, 0 expected failures, 0 broken, 2 failed
The following topics are closed by current state of the patch:
kyua.conf.5
and kyuafile.5
man pages got the respective upgrades.@ngie-eign, thank you very much for your time. Going to work on the outcome of the review.
As much as I'm not a fan of the term "container" because it focuses on the Linux/Solaris technological implementations, I believe that it would be wise to use "container" in lieu of "jail". "jail"s are a concept that are very FreeBSD specific and don't exist outside of FreeBSD/FreeBSD derivatives. I realize this project is hosted in the freebsd GitHub org, but it's a good idea to keep things as generic as possible to avoid having to rearchitect things later if someone contributes an actual Linux or Solaris container implementation (for instance).
Sounds interesting to consider. I have concerns that two levels of abstraction could add extra misunderstanding from end user perspective and extra interface limitations for the possible future execution environments added. The execenv
is already meant to be the first abstraction added. It's abstracted from the concepts and details of a specific execution environment, i.e. it does not imply a containerization only. And currently it's designed as a switch, it could be anything (just thinking out loud): execenv=host
, execenv=jail
, execenv=bhyve
, execenv=docker
, execenv=kubernetes
, execenv=python-virtualenv
, execenv=aws
, etc. And the key abstraction is that parameters are not generalized, due to it can be hard to do and it may introduce limitations only. For example, jail has a single metadata property -- execenv.jail
, what is a string of parameters passed to the jail(8)
program. Docker could have multiple properties like execenv.docker.imageurl
and execenv.docker.runparams
. AWS could have more execenv.aws.*
properties to cover authorization and details of the exact service to be used.
There was no intention to provide an automatic containerization feature which uses the local capabilities like jail for FreeBSD or cgroups et al. for Linux. It simply provides a new concept, execution environment, and implements the first non-default (aka host) environment -- FreeBSD jail based one. It's merely an outcome that it can be used to add extra isolation of some FreeBSD tests to improve parallelism of the test suite. A test author is expected to understand the background if jail execution environment is picked for a test.
This is a short elaboration of the current vision. What do you think? Probably I have not caught accurately the idea with execenv=container
and execenv.container.params
.
Anyway, now I think that execenv.jail
could be re-worked into execenv.jail.params
to keep the "namespace" for possible future improvements of jail execenv with extra execenv.jail.*
metadata.
@ngie-eign, I'm leaving the "Resolve conversation" button for you. I guess it must be convenient for you.
@ngie-eign, I think it's ready for the next round of your review.
The latest push introduces the following changes:
libtool
usageexecenv.jail
to execenv.jail.params
-- to keep execenv.jail.*
open for future improvementsexecenv
engine conf variable to execenvs
-- to keep it consistent with the existing Kyua interfaceI've moved freebsd/*
to os/freebsd/
. There is no os
namespace added, it can be done in future if it's really needed. The main task is to ease work with the history -- by putting the files at the desired location from the very beginning.
I agree with Igor's view about the container/jail naming. There are lots of interface differences between jails, cgroups, zones, etc.., and it's premature to generalize from a single backend implementation. If in the future someone shows up with an equivalent backend for a different OS, it may have constraints which mean that a generic "container" engine will need backwards-incompatible changes to support both jails and the new container type. Meanwhile, this feature provides major benefits to FreeBSD src developers today - with very little effort we can use this to speed up network tests substantially.
As a test, I integrated this PR into FreeBSD and tried running the pf tests with and without execenv=jail
. In bhyve with parallelism=4, it takes ~960s without this patch (most tests are serialized because of jail name clashes), and ~360s with the patch. All I had to do was add two lines to a makefile. I believe kp@ has seen similar improvements.
@ngie-eign is there anything I can do to help speed up review here? I spent a fair bit of time looking at early iterations of the patch and would quite like to see this feature progress into FreeBSD. Should we perhaps import it directly into FreeBSD and refine it there, then reconcile the two copies of kyua later?
@ihoro I believe there is some kind of memory leak in the patch. So far I narrowed it down to something in the FreeBSD sbin tests. If I run them with -v parallelism=4
, something causes kyua to consume a huge amount of memory, triggering the OOM killer. In particular, nothing there is using execenv=jail
. Please let me know if I can help narrow it down further.
@ihoro I believe there is some kind of memory leak in the patch. So far I narrowed it down to something in the FreeBSD sbin tests. If I run them with
-v parallelism=4
, something causes kyua to consume a huge amount of memory, triggering the OOM killer. In particular, nothing there is usingexecenv=jail
. Please let me know if I can help narrow it down further.
Thank you for the extra attention and additional testing. I guess there is no kyua report --verbose
to share respectively.
I've tried main
on aarch64@qemu with kldload pf
and have not re-produced it yet with the following result:
Test cases: 637 total, 46 skipped, 0 expected failures, 1 broken, 1 failed
Working on the amd64@qemu trial.
Any detail of the case's environment would be helpful. Let me guess that probably bricoler
has been used here then probably I will have higher chances to reproduce the same if you provide me with the invocation details. It could be an interesting testing of many things at the same time.
Any detail of the case's environment would be helpful. Let me guess that probably
bricoler
has been used here then probably I will have higher chances to reproduce the same if you provide me with the invocation details. It could be an interesting testing of many things at the same time.
Please try cloning bricoler
, then run make
. Then:
$ ./bricoler run freebsd-src-regression-suite-run --param freebsd-src:url=https://github.com/markjdb/freebsd --param freebsd-src:branch=main-kyua-parallel --param freebsd-src-regression-suite-run:hypervisor=bhyve --param freebsd-src-regression-suite-run:tests=sbin
Note you'll need sudo
to use bhyve for now (I just configure bhyve
and bhyvectl
in sudoers for now on my build system). If you omit the hypervisor
parameter, it'll use QEMU instead, which also works, but it's slower.
While tests are running, you can ssh into the VM with bricoler run freebsd-src-regression-suite-run ssh
. If I do that, I see this in top:
2045 root 122 0 14G 5226M CPU2 2 0:15 100.00% kyua -v parallelism=4
There are some sharp corners here, please feel free to report issues or make suggestions. If you can't make progress, email me and we can debug further. Maybe I made a mistake somewhere.
@ihoro I believe there is some kind of memory leak in the patch.
Hmm, there is something strange going on. It seems to be related to incremental buildworlds that I use by default in my test runner. If I do a clean build, I can't reproduce any problems, but I still don't understand the root cause.
I will keep digging, but please ignore this for now. Sorry for the noise.
@ihoro I believe there is some kind of memory leak in the patch. I will keep digging, but please ignore this for now. Sorry for the noise.
I'm pretty sure there is a latent bug in kyua somewhere, related to -v parallelization
. I've been running the FreeBSD src test suite in a loop with parallel tests, collecting bug fixes (mostly test bugs/hidden assumptions), and I've seen a couple of OOM kills. Again, sorry for the noise, it seems unrelated to this patch.
I would still really like to see this land soon. On my test systems, with -v parallelization=4
, the full test suite takes 3-4 hours to run, which isn't terrible but is rather long. I expect that to drop significantly with this patch and a small amount of work.
I'm pretty sure there is a latent bug in kyua somewhere, related to
-v parallelization
. I've been running the FreeBSD src test suite in a loop with parallel tests, collecting bug fixes (mostly test bugs/hidden assumptions), and I've seen a couple of OOM kills. Again, sorry for the noise, it seems unrelated to this patch.
Thanks for the additional analysis and the test runs.
Just thinking out loud: probably, we could add an extra logic to bricoler
to capture and report such cases, with collecting of kyua core dump and related files if necessary (like kyua
and kyua.debug
binaries). It could provide us with a clue to build hypothesis of the lurking defect. Kyua is an important tool for the FreeBSD Test Suite, hence it should not be a bad thing to have some logic hard-coded specifically for the kyua
test runs.
In addition to that, we could limit the memory of the kyua
process with 200MB
or something. Just to make it crash sooner without waste of resources and time. Just a quick idea, currently I do not have intuition of normal memory usage spikes kyua
may have, I saw something steady like 11MB
for the sbin
tests.
I would still really like to see this land soon. On my test systems, with
-v parallelization=4
, the full test suite takes 3-4 hours to run, which isn't terrible but is rather long. I expect that to drop significantly with this patch and a small amount of work.
From the organizational point of view, my current vision is that only the following topics need closure before the merge:
execenv.container
idea discussion is not resolved yet. My intuition is that @ngie-eign could accept our vision, i.e. it would not require additional work.And I propose to move the following topics to the subsequent PRs:
syntax(N)
version bump question. I have a proposal to mark such tests as failed/whatever. Having kyua process failure during Kyuafile
parsing stage looks less practical, when the whole test suite is rendered unusable. There are no troubles expected from the FreeBSD project perspective, @kprovost and I had a discussion of "MFC dilemma" and it looks like the MFC of the latest Kyua version should cover it, i.e. we won't have to balance between old and new Kyua.homebrew
formula's maintainer could provide the respective upgrade if there is the next official version GA. But I have a feeling of some debt collected so far like Linux support repairing need, etc. This is definitely a separate topic.I'm pretty sure there is a latent bug in kyua somewhere, related to
-v parallelization
. I've been running the FreeBSD src test suite in a loop with parallel tests, collecting bug fixes (mostly test bugs/hidden assumptions), and I've seen a couple of OOM kills. Again, sorry for the noise, it seems unrelated to this patch.Thanks for the additional analysis and the test runs.
Just thinking out loud: probably, we could add an extra logic to
bricoler
to capture and report such cases, with collecting of kyua core dump and related files if necessary (likekyua
andkyua.debug
binaries). It could provide us with a clue to build hypothesis of the lurking defect. Kyua is an important tool for the FreeBSD Test Suite, hence it should not be a bad thing to have some logic hard-coded specifically for thekyua
test runs.
Absolutely. In fact, I occasionally see kyua crash with one of several assertions on arm64, also not understood yet. :(
sbin/nvmecontrol/basic:io_passthru -> passed [2.780s]
sbin/mdconfig/mdconfig_test:attach_vnode_non_explicit_type -> passed [3.087s]
*** /usr/home/markj/bricoler/freebsd-src/src/contrib/kyua/utils/optional.ipp:175: Precondition check failed: _data != NULL
*** Fatal signal 6 received
I plan to extend the freebsd-src-regression-suite
job to automatically fetch kyua.core and open it in a debugger. Similarly, the VM console watcher will soon be able to automatically detect kernel panics and generate a report of that.
In addition to that, we could limit the memory of the
kyua
process with200MB
or something. Just to make it crash sooner without waste of resources and time. Just a quick idea, currently I do not have intuition of normal memory usage spikeskyua
may have, I saw something steady like11MB
for thesbin
tests.
From watching it in top(1), I think there is some specific race condition which triggers a quick blowup of kyua's memory consumption. That is, memory usage is usually below ~12MB, and at some point it quickly starts growing.
@ngie-eign do you have a vision how we could proceed with the Pull Request? Currently it's aligned with the code committed to the freebsd-src/main
. It looks less productive to continue with the related improvements due to this is not the only diff now between freebsd-src:contrib/kyua
and kyua
.
Apologies for being so incommunicado. I'll merge this change now to sync freebsd:main and this repo. It would be good to have followup issues for any items pointed out that haven't been resolved in this PR. Thanks!
Thanks for your time and volunteer effort. The subsequent PRs should be much smaller and easier to discuss.
The main work and review was conducted here: https://reviews.freebsd.org/D42350. The respective mailing list discussion: https://lists.freebsd.org/archives/freebsd-hackers/2024-February/003032.html.