bedrocklinux / bedrocklinux-userland

This tracks development for the things such as scripts and (defaults for) config files for Bedrock Linux
https://bedrocklinux.org
GNU General Public License v2.0
602 stars 65 forks source link

If /usr/src is global DKMS "just works" #251

Open GermanBread opened 2 years ago

GermanBread commented 2 years ago

Howdy!

The setup

What I did

So I was giving cross-stratum DKMS a try in my fresh Bedrock VM (didn't work) I noticed that /lib/modules is a global path, but /usr/src is not (!)

I found out that DKMS needs the kernel sources of the running kernel in /usr/src to function at all

First thing I did was symlink the kernel source in the Debian stratum to the Arch stratum, didn't work

Then I made /usr/src global, DKMS started working as expected (I created /bedrock/strata/bedrock/usr/src, moved the contents of every /usr/src into /bedrock/strata/bedrock/usr/src and then made /usr/src a global path in bedrock.conf)

Now you don't need to install DKMS in the kernel providing stratum and pacman (other package managers not tested) can automatically install DKMS modules

The DKMS modules persist between reboots and between kernels

TL;DR

Images

(ignore the warnings about missing kernel sources, that's on me)

pacman automatically installing DKMS

Screenshot_20220212_150407 Screenshot_20220212_152147

Cross-stratum DKMS working

Screenshot_20220212_152306

Arch and Debian kernel side-by-side

Screenshot_20220212_152812

GermanBread commented 2 years ago

I dug around a bit more and found that /bedrock/cross/src is a thing.

So first thing I tried is installing DKMS modules using dkms autoinstall --kernelsourcedir /bedrock/cross/src

Unsurprisingly, DKMS errors out Screenshot_20220212_163109

Now without specifying --kernelsourcedir (and /usr/src as a global path) Screenshot_20220212_163309

Seems like making /usr/src global is the only solution to "fix" DKMS?

GermanBread commented 2 years ago

So I updated my test VM (and the Arch strata moved to a new glibc version, namely 2.35), and of course DKMS freaked out.

I installed the linux-image-amd64 package and DKMS did not work.

Turns out dkms autoinstall NEEDS the binutils to come from the Debian stratum if it deals with the Debian kernel sources.

So if I run sudo strat -r debian dkms autoinstall DKMS doesn't exit with a nonzero code anymore, but the DKMS modules still don't get automatically installed.

Manually installing the modules by restricting dkms to the (running) kernel strata works!

Fast-forward 5 minutes

I made /var/lib/dkms a global path, I still have to restrict DKMS to the stratum of the running kernel, but at least dkms status returns the status of ALL kernels, regardless of stratum Screenshot_20220217_161528

But dkms autoinstall is still broken!

Of course, while reading through bedrock.conf, I found this: Screenshot_20220217_162156

So I removed the comment and undid all my changes up until now. dkms (auto)install still needs to be restricted to the stratum of the running kernel...

And dkms status only lists the modules of the stratum it has been restricted to... (maybe /var/lib/dkms should stay global?)

Seems like making /usr/src global is the only solution to "fix" DKMS?

I don't think so

I am going to call it a day for now, will look into this later...

My ultimate goal is to figure out how to mix kernels and DKMS from multiple strata, rather than just using only one stratum for kernels.

paradigm commented 2 years ago

Hi! Sorry for the delay. Props on your digging - you figured out quite a lot here!

The current Bedrock Linux 0.7 Poki documentation for cross-stratum feature compatibility, including any need for extra steps to make something work, is here. (The installation instructions tries to get people to learn about this page before installing Bedrock). The cross-stratum feature page includes a dkms section which does document the binutils constraint you figured out: the dkms binary (and dependencies like binutils) need to come from the same stratum that provides the kernel for things to work reliably. There's another constraint documented there that I think you missed, which is that some dkms modules versions have limited kernel version support. Generally this is a problem with older dkms modules and newer kernels, as the kernel doesn't guarantee any kind of backwards compatibility here.

I don't know how to make Bedrock automate enforcing these constraints, which leaves it up to the user. I found a number of people were unpleasantly surprised by broken modules they couldn't figure out, which lead me to disable cross-stratum dkms by default in bedrock.conf. I just realized that I failed to update the website to indicate the need to manually re-enable this if you understand the constraints when I originally disabled the functionality out-of-the-box, and so I did so now. Props on finding this on your own without documentation pointing in the right direction.

The concern with making /usr/src global is that package managers could fight over it. While not super common, it's quite possible someone installs the same version of a dkms-providing package in two strata. For example, maybe he/she is cloning a stratum to back it up before making changes and didn't have the dkms package consciously in mind when doing so. /bedrock/cross is how we get around this: we make the desired cross-stratum resources available at Bedrock-specific path so resource producers (i.e. package managers) don't fight over it. We also configure resource consumers (in this case, dkms) to look at this new Bedrock-specific path. While in this src scenario we just have to forward bits along, in some others Bedrock has to make changes to the files on-the-fly to make them work cross-stratum. Ideally we'd do this with /lib/modules as well, but at the moment I don't know how to reliably redirect all resource consumers to look at /bedrock/cross. This may be something we should investigate more deeply before Bedrock Linux 0.8 Naga is out.

Can you elaborate on why you think /var/lib/dkms should be global? It sounds like keeping it local means a given dkms binary will only report the status of modules from its corresponding stratum's kernel, which I think is desirable as it is consistent with the pairing requirement you found with binutils. In theory having dkms status work for all kernels but dkms install have an unenforced constraint that it has to be paired with kernels from its own stratum is possible, but I think that'd be very confusing.

Other than my previously mentioned now rectified oversight about documenting the need to manually enable cross-stratum functionality once you're aware of the constraints at hand, I think things are about as good as we can trivially get them here. Making one stratum's dkms binary consistently work with another's kernel doesn't seem feasible, as there's subtle assumptions about the modules and kernel being built with the same tooling/flags/etc. Having Bedrock enforce pairing dkms and kernels might be possible, but we'd first have to figure out how to make /lib/modules local (so each dkms binary sees its own stratum's instance), which in turn will require making all other module consumers - kmod modprobe, busybox modprobe, maybe udev, possibly others I don't know of - all look at a hypothetical /bedrock/cross/modules/ directory. If you or someone else heads this effort I'd be all for it, but I won't have the time to look into it myself for a long while.

My ultimate goal is to figure out how to mix kernels and DKMS from multiple strata, rather than just using only one stratum for kernels.

To make sure we're on the same page, AFAIK mixing kernels and dkms modules from different strata usually works, provided you enable the cross-stratum functionality and manually pair the dkms binary you're using with the kernel you're building the module for. The only times it doesn't that I know of are because of source-level incompatibilities between the kernel and module, usually because of a backwards-incompatible kernel change.

Very minor, but I figure I should point it out just in case it helps: restricting when compiling is a good habit, and we are certainly compiling with dkms. If we always want to restrict a certain binary - like dkms - Bedrock can automate that for you. It should do so with dkms by default. The need to manually restricting usually comes up with stuff like ./configure scripts that Bedrock can't differentiate from stuff we don't want to restrict. That having been said, explicitly restricting does no harm here. If anything it may be better to do it redundantly than miss it when it is needed.

GermanBread commented 2 years ago

Hi! Thanks for your detailed response!

Props on finding this on your own without documentation pointing in the right direction.

I'm a developer, we don't read documentation :P


Can you elaborate on why you think /var/lib/dkms should be global? It sounds like keeping it local means a given dkms binary will only report the status of modules from its corresponding stratum's kernel, which I think is desirable as it is consistent with the pairing requirement you found with binutils. In theory having dkms status work for all kernels but dkms install have an unenforced constraint that it has to be paired with kernels from its own stratum is possible, but I think that'd be very confusing.

My thought was that the user could easily keep track of which kernels have module X installed (I was confused at first to find out that Arch's dkms status didn't report back which modules were installed in the Debian stratum). Granted most people would just write a shell script that handles the DKMS shenanigans (I would do that).

Alternative: Maybe adding DKMS functionality to brl could be an option?


[...] but we'd first have to figure out how to make /lib/modules local (so each dkms binary sees its own stratum's instance), which in turn will require making all other module consumers - kmod modprobe, busybox modprobe, maybe udev, possibly others I don't know of - all look at a hypothetical /bedrock/cross/modules/ directory. If you or someone else heads this effort I'd be all for it, but I won't have the time to look into it myself for a long while.

First thing that immediately came to mind was creating a private mount namespace for those consumers where /lib/modules is a bind-mount from /bedrock/cross/modules (I will try this out later [using bash magic and some bubblewrap])

Forcing udev into it's own mount namespace doesn't sound like a good idea though.

paradigm commented 2 years ago

Hi! Thanks for your detailed response!

Of course!

I'm a developer, we don't read documentation :P

;)

My thought was that the user could easily keep track of which kernels have module X installed (I was confused at first to find out that Arch's dkms status didn't report back which modules were installed in the Debian stratum). Granted most people would just write a shell script that handles the DKMS shenanigans (I would do that).

Yeah, for better or worse I think this is our best option in the immediate future.

Alternative: Maybe adding DKMS functionality to brl could be an option?

There are other use cases that Bedrock can make work cross-stratum, but don't just-work. For these users have to jump through some hoops. In addition to the dkms issue which prompted this discussion, there's also nVidia driver shenanigans, as well as some caching issues with application menus, and probably others that aren't immediately coming to mind. I hadn't considered it before, but the idea of a subsystem which provides user-friendly automation for this hoop-jumping could certainly provide a lot of value. I'll have to give it more thought before committing to the idea, but at the moment I find it promising.

The hardest part, of course, is coming up with the name for the subsystem. brl hoop makes sense to me but I don't know if its purpose will be sufficiently obvious to new users before they read its --help or other documentation. brl hack would have worked back when the word had a different meaning in computer circles, but today people's minds will go to netsec before they think "awkward fix."

First thing that immediately came to mind was creating a private mount namespace for those consumers where /lib/modules is a bind-mount from /bedrock/cross/modules (I will try this out later [using bash magic and some bubblewrap])

Forcing udev into it's own mount namespace doesn't sound like a good idea though.

This is certainly an interesting idea. Bedrock 0.7 Poki's minimum kernel version predates a lot of mount namespace tooling, but the upcoming 0.8 Naga is likely to bump the minimum kernel version up enough to open such possibilities here. I'm planning on using mount namespaces in Naga as a solution to another set of issues, and thus we'll have good infrastructure for managing namespaces there anyways.

I share your concern about applying this to udev. Another concern is that it isn't obvious to me we'll have a method to force everything at play into its own namespace. A particularly difficult case, for example, would be a busybox-based distro's init script which calls modprobe with the busybox ENABLE_FEATURE_PREFER_APPLETS setting that forces busybox to check for its own applets before checking the $PATH.

One of the challenging parts of Bedrock is coming up with improvements for one subsystem without impairing others. As cool as it would be for dkms to just-work, I don't want it to come at the expense of some existing just-work things breaking.

I haven't dug deeply into the corresponding modprobe/udev/etc documentation or code, but it seems plausible there's a way to configure them to look in another location. If that's available, Bedrock may be able to enforce the configuration values accordingly. kmod modprobe does have a promising looking --dirname option.

A sort of last-resort option would be to (ab)use FUSE and mount our own filesystem on /lib/modules which dynamically detects whether the incoming request is from a producer or consumer and hide/show things accordingly. I'd really like to avoid this if we can, though, as it introduces performance filesystem access performance overhead, uses more RAM, and generally confuses users due to things like ls showing files that dkms complains it can't find. I haven't fully thought it through, but I may prefer living with some sort of brl hoop dkms requirement than this.