CFI violation in rm_kernel_rmapi_op

sempervictus commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

525.60.13

Does this happen with the proprietary driver (of the same version) as well?

I cannot test this

Operating System and Version

Arch Linux Current

Kernel Release

6.0.16

Hardware: GPU

A5000

Describe the bug

When building the kernel (and therefore all modules) using RAP CFI, the open GPU modules code actually compiles but the nvidia module cannot be loaded into the kernel because it has a CFI violation in rm_kernel_rmapi_op: RAP hash rm_kernel_rmapi_op/3262ced1 for rm_kernel_rmapi_op+0x0/0x14d [nvidia] does not match existing hash 1f0fc3eb00017f. The function is defined as void NV_API_CALL rm_kernel_rmapi_op(nvidia_stack_t *sp, void *ops_cmd) with consistent calling conventions (to my naked eye) throughout the code, so this doesnt look to be a case of overridden/colliding names. The NV_API_CALL definition also looks like it shouldn't change during compilation. Do any of the developers know whether something its doing, or its compiled, or the way that its called could break call or return hash checks?

On a related note - the stackprotector appears to be used when its detected as available, not when its detected as having been used to build the rest of the kernel (RAP obviates the need for SSP when built with return checks, so while its available to the compiler, its not enabled during full-RAP builds).

To Reproduce

Build module against a RAP-instrumented kernel, then try to insmod

Bug Incidence

Always

nvidia-bug-report.log.gz

N/A

More Info

No response

timocapa commented 1 year ago

Hit the same

CFI failure at nvkms_call_rm+0x5b/0xa0 [nvidia_modeset] (target: rm_kernel_rmapi_op+0x0/0x190 [nvidia]; expected type: 0xba54dd86)

Another one is

CFI failure at nvkms_kthread_q_callback+0x101/0x140 [nvidia_modeset] (target: _nv000067kms+0x0/0x10 [nvidia_modeset]; expected type: 0xe1419545)

(Clang CFI + nvidia-dkms)

Clang 16 (built from master a few days ago), so this is likely kCFI

sempervictus commented 1 year ago

Thanks for chiming in @timocapa. Given that we're using two completely different CFI mechanisms (clang vs a GCC plugin on my end), i think this is a strong indication of the driver doing something "unsavory" at runtime despite meeting both implementations' build-time requirements. I'm guessing that i just never get to the stack frame with the nvkms_kthread_q_callback call/access since the early failure prevents anything else from executing after the first failure.

timocapa commented 1 year ago

I'm guessing that i just never get to the stack frame with the nvkms_kthread_q_callback call/access since the early failure prevents anything else from executing after the first failure

Perhaps - I put kCFI into permissive mode, so failures get logged but execution continues

sempervictus commented 1 year ago

Ping @aritger - your name is on all the release tags, so pinging direct to get upstream attention on this.

My situation w/ RAP might not be common enough to merit upstream remediation, but with kernel CFI as a technique becoming more mainstream (in general Linux and Android), i think that this issue occurring on various CFI implementations does merit attention from Nvidia folks... The day some distro starts shipping KCFI will be the day when these drivers are prevented from loading at all on an common OS. End-user markets are one thing, but a fair number of our mutual customers are hospitals, universities, pharma, etc - institutional buyers who have to operate under their GRC mandates (to use state of the art exploit mitigation techniques and not disable security functions which is what kCFI permissive mode does).

aritger commented 1 year ago

Thanks for the report.

I apologize for my ignorance, but what is an easy way for me to configure a RAP-instrumented kernel? And/or how to configure the kernel and open-gpu-kernel-modules to build with Clang CFI?

For rm_kernel_rmapi_op:

NV_API_CALL should be #defined to nothing in open-gpu-kernel-modules builds.

Both the implementation and callers should agree on the signature:

void       NV_API_CALL rm_kernel_rmapi_op(nvidia_stack_t *sp, void *ops_cmd);

rm_kernel_rmapi_op is interesting in that nvidia.ko uses it in a function pointer assignment in a const structure in kernel-open/nvidia/nv-modeset-interface.c:nvidia_get_rm_ops():
```
    const nvidia_modeset_rm_ops_t local_rm_ops = {
        [...]
        .op             = rm_kernel_rmapi_op, /* provided by nv-kernel.o */
        [...]
    };
```
nvidia_get_rm_ops() uses the above 'local_rm_ops' const local variable to assign the passed-in nvidia_modeset_rm_ops_t*:
```
    *rm_ops = local_rm_ops;
```
nvidia_get_rm_ops() is an EXPORT_SYMBOL()ed function, which nvidia-modeset.ko calls.
nvidia-modeset.ko then calls through the function pointer in kernel-open/nvidia-modeset/nvidia-modeset-linux.c:nvkms_call_rm()
```
        __rm_ops.op(stack, ops);
```
(&__rm_ops is the nvidia_modeset_rm_ops_t* that nvidia-modeset.ko passes into nvidia_get_rm_ops())

For nvkms_kthread_q_callback:

This doesn't appear to use __rm_ops, like the rm_kernel_rmapi_op case.
It /does/ call through the function pointer nvkms_timer_t::proc, with signature nvkms_timer_proc_t, which is:
```
typedef void nvkms_timer_proc_t(void *dataPtr, NvU32 dataU32);
```
As far as I can tell, the caller and functions that get assigned to ::proc agree on the signature.

Is any of this a violation of CFI?

sempervictus commented 1 year ago

Thanks for picking this up @aritger. Far as RAP goes, the last public release is from some years ago, so for "easy setup" you'd need an old kernel. However, since the problem is reproducible using currently publicly available tooling as indicated by @timocapa, i would suggest going the kCFI route via LLVM/clang to create relevant instrumentation in the runtime. Moreover, RAP isn't permissive, so kCFI seems a better way to find occurrences of the concern. I can build any kCFI-derived remedy with my RAP plugin and currently installed kernel to verify the remedy working there as well (should ensure that the fix is not reliant on some version-specific implementation of clang's CFI).

My gut sense is that the "call through" approach is what's tripping up CFI, but need to consult with people better informed on the matter... Hopefully they're able to weigh-in directly.

timocapa commented 1 year ago

And/or how to configure the kernel and open-gpu-kernel-modules to build with Clang CFI?

The Linux kernel should take setting LLVM=1 make as argument to fully switch to Clang's utilities - CFI is disabled by default so you'll have to enable its config. Clang versions newer than 14.0.x will likely hit errors due to new warnings turning into errors changes. (https://github.com/gentoo/gentoo/blob/master/x11-drivers/nvidia-drivers/files/nvidia-drivers-525.23-clang15.patch)

I've used Clang 16 which I can upload later.

If setting LLVM=1 won't work for the NVIDIA driver, you'll likely just override CC and CXX I guess? Not sure, I can only use the proprietary driver due to having a Pascal GPU - DKMS has been updated a while ago to try to match the compiler that was used for the kernel

Other than that I don't know enough about CFI to answer the other questions, sorry. :(

sempervictus commented 1 year ago

Regarding the CFI piece - apparently the types of pointers being passed as arguments to functions must be understood by the compiler when creating hashes for calls and returns. Structure definitions present in kernel-open but missing from src/nvidia or the other way around will produce different representations in CFI's comprehension of "what its seeing in the code." That's the current running theory anyway. There seems to be a linker hack-around for this concern already in-place @ src/nvidia/exports_link_command.txt which i imagine also prevents LTO from "being all that it can be" since much like CFI, it needs to be type-aware to perform its optimizations.

Is the "partitioning" of the code set up to produce some sort of functional benefit in compilation, or a byproduct of the open-sourcing process slated for remedy at some point?

sempervictus commented 1 year ago

@artiger: wiser folks than me have noticed that the module is not being built using Kbuild (which explains why i had the SSP problem mentioned in the original post). Meaning that the hash showing up as a mismatch in my error output just some random segment of the binary...

pageexec commented 1 year ago

just my 2 cents: the core issue is that the whole build system is structured the 'wrong way' and results in objects with mixed provenance linked together. namely, some objects are produced by the kernel's Kbuild system and some are not. how this can possibly work is a mystery, just think about RANDSTRUCT being applied to one set of objects but not the others, etc.

pageexec commented 1 year ago

in more details: Makefile defines the modules target as: 58 modules: $(nv_kernel_o_binary) $(nv_modeset_kernel_o_binary) 59 »·······$(MAKE) -C kernel-open modules

here kernel-open/Makefile will end up using the kernel's Kbuild system to build the objects, which is how it should be done for everything else too. however nv_kernel_o_binary is built this way: 33 $(nv_kernel_o):
34 »·······$(MAKE) -C src/nvidia

which in turn ends up in src/nvidia/Makefile which has nothing to do with Kbuild and just builds the objects with some made up CFLAGS that have no relationship with the rest of the kernel's KCFLAGS. this is just broken, it'll not produce anything that will have CFI/etc instrumentations. so before spending too much time on these supposed CFI violations (they are not, as far as I checked), fix the build system first then start looking for problems.

pageexec commented 1 year ago

small bonus, at compile time, RAP caught a prototype mismatch on nv_encode_caching (between kernel-open/common/inc/nv-proto.h and kernel-open/nvidia/nv-mmap.c).

sempervictus commented 1 year ago

@pageexec - thanks for jumping in at whatever ungodly time it is over there. That mismatch is just a matter of

diff --git c/kernel-open/common/inc/nv-proto.h w/kernel-open/common/inc/nv-proto.h
index 815107d25..b016e42e6 100644
--- c/kernel-open/common/inc/nv-proto.h
+++ w/kernel-open/common/inc/nv-proto.h
@@ -43,7 +43,7 @@ void        nv_procfs_remove_gpu        (nv_linux_state_t *);

 int         nvidia_mmap                 (struct file *, struct vm_area_struct *);
 int         nvidia_mmap_helper          (nv_state_t *, nv_linux_file_private_t *, nvidia_stack_t *, struct vm_area_struct *, void *);
-int         nv_encode_caching           (pgprot_t *, NvU32, NvU32);
+int         nv_encode_caching           (pgprot_t *, NvU32, nv_memory_type_t);

i've lost track of what's already in your patch and what i've added since, but happy to send over what i have currently.

@aritger: ^^ is one of said wiser folks to whom i was referring. :grin: I'm pretty sure that the build conditions not being consistent are also what resulted in my having to

commit c0f1b8eabb1a1cfb0b401848dd65b4e30114a7e3 (HEAD -> 525.60.13-grsec)
Author: RageLtMan <rageltman [at] sempervictus>
Date:   Wed Dec 28 16:07:00 2022 -0500

    Force no-SSP to avoid RAP conflict

diff --git a/utils.mk b/utils.mk
index 17421ba9f..0ff35f725 100644
--- a/utils.mk
+++ b/utils.mk
@@ -39,7 +39,7 @@ AR                    ?= ar
 # only set these warnings if CFLAGS is unset
 CFLAGS                ?= -Wall
 # always set these -f CFLAGS
-CFLAGS                += -fno-strict-aliasing -fno-omit-frame-pointer -Wformat=2
+CFLAGS                += -fno-strict-aliasing -fno-omit-frame-pointer -Wformat=2 -fno-stack-protector
 CC_ONLY_CFLAGS        ?=
 CXX_ONLY_CFLAGS       ?=
 LDFLAGS               ?=

in order to build when i'm using RAP because RAP (with return protection) disables the Kconfig for stack protection but the build system still tries to use it (when the rest of the kernel doesn't) resulting in a compile failure without the hack above.

aritger commented 1 year ago

Thanks for the feedback. That is a fair critique.

For whatever it is worth, there are a few motivations for the split:

(1) Historically, the non-kbuild part (the part that produces nv-kernel.o) was built internally to NVIDIA and is what was distributed as binary-only. Code not built for a specific target kernel cannot use kbuild.

(2) With the advent of open-gpu-kernel-modules, we chose to retain that split so that users installing the driver wouldn't be required to build all of the kernel module when installing the driver. I.e., installing the driver from the NVIDIA .run file contains a pre-built open-gpu-kernel-modules nv-kernel.o. We can only do that because nv-kernel.o is not kernel-specific. Currently, open-gpu-kernel-modules takes about 10 minutes to build if single threaded. Much of that can be covered with a parallel build, but we didn't want to add that install time for every user installing from .run file if we didn't need to.

The big disadvantage of the split is of course that you need to match these sorts of compiler flags across the split if doing instrumentation like RAP.

Maybe the benefits of (2) are outweighed by the downsides and we should revisit that decision.

That is at least the context. So, I don't know if we can immediately move to an all kbuild-native build.

The nv_encode_caching() bug is a good catch. Thanks for that. Does nv-mmap.c not include nv-proto.h? If not, that is a bug, too. Even with the current split, I would expect the compiler to complain if the prototype and implementation mismatch.

For the near-term, would it be acceptable to pass these additional CFLAGS on the make commandline? Maybe the makefiles need more variable plumbing to facilitate that. But, I think it will be easiest to get traction with something like that, than require kbuild-ifying the entirety of the open-gpu-kernel-modules build. The code changes for that wouldn't be difficult, but the hard part would be the packaging/installation implications of that choice.

sempervictus commented 1 year ago

Thanks for the clarification @aritger.

If we stick to the RANDSTRUCT example @pageexec provided, i think that the common object file still poses a problem because its structure formats will not match the same structures with randomized field orders in a RANDSTRUCTed kernel tree. Far as shipping the pre-built object file, that might be fraught even w/out the plugins/kCFI/LTO concerns if the toolchain which builds the on-target kernels changes sufficiently from the one that made the object file to produce ABI mismatches between its products and the pre-made .o file. Even as it stands now, some people use various recent versions of clang to build, most use name-a-version of GCC (AFAIK). I'm no build-system expert (bloody things confound me), but at a high-level it seems a bit counterintuitive to further complicate the separate Makefile (especially in cases of out-of-tree stuff) than move toward a unified Kbuild approach though i cannot fully fathom the complexities of either.

In terms of "drivers for effort" - we're actually in the process of pushing adoption for @pageexec's work product in several hospitals, universities, and in fed space. All of those deployment targets process a hefty amount of patient data, lots/most of it through NV hardware on Linux hosts (and in some cases via what is now Nvidia fabric which could use some hardening too in its Linux control plane). These enterprise-grade orgs are taking a bigger interest in runtime security and threat mitigation as they're realizing that any response is already an action taken too late. If we were able to build drivers for the enterprise class HW w/ build-and-runtime hardening, it would deliver greater standoff to our mutual customers which translates to increased safety for most of our PHI (pretty sure @pageexec doesn't use our healthcare system too often). Lastly, there's a potential performance incentive here: LTO (and RAP) build optimized binaries, and in markets where nitpickers compare products in impractical increments, a practical gain in operating efficiency is likely to get some good press.

aritger commented 1 year ago

(If you're not already familiar with the distinction between the "OS-agnostic" and "kernel interface" portions of nvidia.ko, https://github.com/NVIDIA/open-gpu-kernel-modules#kernel-interface-and-os-agnostic-components-of-kernel-modules may be useful context)

Thanks for the feedback.

Far as shipping the pre-built object file, that might be fraught even w/out the plugins/kCFI/LTO concerns if the toolchain which builds the on-target kernels changes sufficiently from the one that made the object file to produce ABI mismatches between its products and the pre-made .o file.

In general, and with care, we've mostly been able to produce one nv-kernel.o that works, independent of the tool chain used for the kernel and for the portion of nvidia.ko that is built for the target kernel. There have been bugs along the way that we've had to fix, sure, but I don't know that I would characterize it as "fraught".

For RANDSTRUCT: are you referring to CONFIG_GCC_PLUGIN_RANDSTRUCT? My experience with that config option is a few years old, but my understanding is that only certain structures within the kernel, explicitly decorated with "__randomize_layout", get their layouts randomized. By definition, the source that comprises the nv-kernel.o portion of nvidia.ko (i.e., the "OS-agnostic" portion) does not include any Linux kernel header files (it couldn't be agnostic to kernel version/configuration, if it did), so I wouldn't expect nv-kernel.o to be impacted by CONFIG_GCC_PLUGIN_RANDSTRUCT.

It is definitely important that open-gpu-kernel-modules works with RAP, CFI, et al; you don't have to try to sell anyone on that :)

My only point is that making the entirety of open-gpu-kernel-modules kbuild native comes with benefits, but also some tradeoffs. I can't say for certain that kbuild native will be the best solution for this class of problem in the short term.

pageexec commented 1 year ago

Hello Andy,

first of all, thank you for your replies, you raise several good points that I'll reflect on below.

To start with, I have >15y experience with trying to match PaX/grsecurity code with nvidia's (at the time) binary blob and wrapper stuff. I used to post patches for major versions that would make the nvidia kernel modules loadable under PaX/grsecurity kernels, with many of our kernel self-protection features still enabled. Some of these features were compiler independent, some weren't. In any case, there was never a doubt that the binary-only part of the modules weakened the overall security value of these features due to not being covered by a given defense. This is just to say that the current situation is no worse than what we've been dealing with before this open source release existed, BUT this time we may have a chance to fix the situation and I'd like to help with the process where I can.

Next, let's discuss the source/build split. Fundamentally, the problem is that you created the split to achieve the exact opposite of how kernel code is supposed to be built so there will have to be a hard decision on this sooner or later. For the benefits of the split build system you're saying that it's to lessen the burden on end-users installing the driver. For users like Gentoo I think that's a non-issue to begin with since they are used to building from sources. For distro users I don't see what would prevent a distro from pre-compiling the currently non-Kbuild part themselves and have the end users only build the current Kbuild parts (via DKMS I assume) as they're doing it now. This would allow the distro maintainers to even tailor the nvidia kernel packages to their particular kernel versions without losing the benefit of using Kbuild throughout. In other words, distros would have to do a bit more work than just repackage the nvidia provided tarball, but that's nothing unusual for them given how they're the ones to build everything from sources anyway.

As for how to get from the split build to all-Kbuild: I think hunting down and then replicatiing every single kernel config option and its effect on the compiler/assembler/linker flags will not scale too well and you'd just end up reimplementing Kbuild basically. So I suggest to not bother with that whack-a-mole game and just switch to Kbuild.

Last but not least, some specific issues:

RANDSTRUCT does not only instrument explicitly marked types but also those matching scripts/gcc-plugins/randomize_layout_plugin.c:is_pure_ops_struct so if you have such ops types/objects shared between the two build worlds, they're going to fail on RANDSTRUCT configs. Note that I'm not talking about ops types defined by linux (that the OS-agnostic/non-Kbuild world does not use anyway) but those defined by your own code - if they're shared between the two build worlds, they're broken under RANDSTRUCT. The same considerations apply to other similarly automated transformations, such as CONSTIFY or RAP in case of PaX/grsecurity, or other forward/backward edge CFI solutions that tie caller/callee pairs together via runtime checks.
IIRC, the two types of nv_encode_caching are deemed compatible by C (and thus gcc), just not by RAP (for a good reason), that's why you never saw a warning in your builds (and yes, the .c includes the .h in question).
out of curiosity, how do you manage to build a nv-kernel.o object that works with coverage and sanitizer features enabled on the Kbuild side? Then there's the low-level mitigations against Spectre, lately kCFI, etc that require not just gcc switches but post-processing by objtool as well...

sempervictus commented 1 year ago

In terms of the distro concern - this can be handled pretty easily with an nv-userspace package containing all of the bins and libs, along with the distro-revision-specific object file to accelerate DKMS builds. By pinning the nv-dkms package dependency to the userspace package, you can give "easy-button" users the same functionality currently provided but with the exact ABI of that distribution since it'll be the same exact toolchain with the same exact options. A build-time check on the DKMS module to compare its compilation environment with that of the .o could be used to inform its build process whether or not it needs to rebuild that cumbersome binary. Anyone running a custom kernel, something like Arch, or truly down the rabbit hole like those Gentoo folks will not mind the additional compilation time to get a full set of userspace and kernel binaries built to run properly on their oddball system.

aritger commented 1 year ago

Sorry to let the conversation here languish.

The distribution concern can't be quite so easily dismissed... for better or worse, today a not insignificant portion of our user base installs the driver via the distro-agnostic .run file.

For this:

out of curiosity, how do you manage to build a nv-kernel.o object that works with coverage and sanitizer features enabled on the Kbuild side? Then there's the low-level mitigations against Spectre, lately kCFI, etc that require not just gcc switches but post-processing by objtool as well...

For coverage and sanitizer features, we build the nv-kernel.o with additional CFLAGS specific to the intended build (e.g., -fsanitize=undefined, et al, in the case of ubsan).

Spectre mitigation is unconditionally built into nv-kernel.o.

All that being said, I appreciate that the way we're currently handling nv-kernel.o is not very idiomatic to the kernel, and that leads to a variety of friction. I'll look into adding a kbuild-native build flow. I suspect there are going to be some challenges such as building the C++ files (I wish we didn't have that, but it is there today...). I fear this won't happen quickly, so if you need a solution in the short term, editing CFLAGS in utils.mk like you have above is probably the best thing to do for now.

sempervictus commented 1 year ago

@aritger - thanks for the clarification (its been ages since i've used the .run file to install, but i bet you've got years of my download logs of them to use for extraction). Would very much appreciate having a native Kbuild path - thank you.

Spectre mitigation is unconditionally built into nv-kernel.o.

Which one(s)? :smile: Unfortunately different versions of the same vendor's CPUs, as well as the other major vendor's, in the same architecture, handle the various mitigation mechanisms differently (with varied results in terms of security posture). It might be somewhat dangerous to presume mitigations are in-effect across the board.

I suspect there are going to be some challenges such as building the C++ files

Eh, rewrite it in Rust, Linux is coming around to it :stuck_out_tongue:. In all seriousness though, there are ways to use C++ in the kernel, especially if the intent is to always be out-of-tree.

@pageexec - happen to have any dirty-hacks in your toolbag to excise current-header-state KCFLAGs to be fed into non-Kbuild CFLAGS during a DKMS build?

sempervictus commented 1 year ago

@aritger: i'm now easily able to reproduce the LLVM16-based kCFI failures as well. If you'd like, i can whip up an Arch PKGBUILD using the LLVM 16 RC or just build a small disk image running 6.1.12 w/ kCFI. Same thing as reported by @timocapa.

The notion of not building drivers for an OS in the way that an OS mandates drivers be built technically means that Nvidia doesn't support Linux as the binaries produced are best-guess approximations of the correct ABI, not actually the appropriate ABI of the OS. The A5000 in my Xeon laptop is losing value daily, no ROI possible, and not really incentivizing me to keep buying high-end nvidia GPUs when i can't actually use them. On a larger scale, if CISA tells USG to start using CFI on Linux tomorrow (LLVM, GCC, or GCC plugin), this will be a bigger issue than some users on github pointing out the problem.

Please implement Kbuild-based driver compilation to support Linux as it is compiled, not as CFLAGs in the Makefile would prefer to have had it compiled.

elsandosgrande commented 1 year ago

@aritger Have there been any updates on this issue? Are users of the proprietary module, such as myself, likely to see any future fixes? I'm willing to try matching the compiler flags between the two worlds, but that seems fickle in the long run, even on Gentoo.

aritger commented 1 year ago

Sorry, I don't have any update. The status is still the same as described earlier in this thread:

For the open-gpu-kernel-modules build, we'll eventually provide a kbuild-native flow; I don't have an ETA, yet
For the closed-source kernel modules, I don't anticipate that anything significant will change over what we have today.

elsandosgrande commented 1 year ago

All right.

By the way, @aritger, where can I report compiler warnings and errors regarding the driver? When the non-Kbuild world gets compiled with my global compiler flags, I get a bunch of warnings regarding the code and no errors, but, when I try to match the flags, I get fewer warnings but also an error regarding the code.

aritger commented 1 year ago

I think you can file an Issue here in this repo's Issue Tracker.

elsandosgrande commented 1 year ago

All right, I'll give it a shot. Thanks!

Edit: The error seems to be just a warning turned into an error by the Kbuild flags.

Edit 2: As the new issue dialogue explicitly points me to the forums and the build issue template asks for a commit hash, I'll go to the forums a bit later.

Edit 3: Interestingly enough, 128-character passwords don't work, but 64-character ones do.

Final edit: In case anybody's curious: https://forums.developer.nvidia.com/t/numerous-warnings-when-compiling-the-kernel-driver-package/261594

sempervictus commented 11 months ago

ping @Adam-pi3: as someone familiar with various CFI schemes i think you might be in a fairly unique position to help explain the problem to other NV folks a little bit better.

The sort of function substitution with common prototypes being done here actually is a CFI violation and the more fine-grained these mechanisms become, the less wiggle room there is to "work around using real kernel code." Nvidia's drivers run in clouds, banks, pharmas, hospitals, universities, military research systems, and other key components which are precluded from functional hardening against out of order code (re)use, forms of injection, etc due to the machinations being performed to avoid acting like in-tree code does. ML is currently a sensitive topic re the embargoes/geopolitical machinations afoot and it all seems moot if the models built by those placing the embargoes were coopted by those who are being embargoed because the systems producing them inherently preclude use of certain runtime protections (especially as a licencing gimmick in an effectively dead licensing order). Can we please drop the "GPL condom charade" and use real protection instead?

michael-brade commented 8 months ago

I have the "same" problem, but no RAPL, just with a CFI enabled kernel (6.1.77): it is not possible to use CFI in permissive mode because after a suspend/resume cycle, the log gets spammed with 11 GB of constant failures. They all look like this:

kernel: CFI failure at nvkms_call_rm+0x5e/0xb0 [nvidia_modeset] (target: rm_kernel_rmapi_op+0x0/0x220 [nvidia]; expected type: 0xba54dd86)
kernel: WARNING: CPU: 21 PID: 2883 at nvkms_call_rm+0x5e/0xb0 [nvidia_modeset]
kernel: Modules linked in: md5 snd_seq snd_seq_device overlay bridge stp llc binfmt_misc nvidia_drm(POE) nvidia_modeset(POE) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio le>
kernel: CPU: 21 PID: 2883 Comm: Xorg Tainted: P           OE   T  6.1.77 #36
kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ULTRA/X570 AORUS ULTRA, BIOS F32 01/18/2021
kernel: RIP: 0010:nvkms_call_rm+0x5e/0xb0 [nvidia_modeset]
kernel: Code: 53 fc 74 02 0f 0b 2e e8 f0 c2 05 e8 85 c0 75 41 4c 8b 1d a5 8a 08 00 48 8b 3c 24 48 89 de 41 ba 7a 22 ab 45 45 03 53 fc 74 02 <0f> 0b 2e e8 ca c2 05 e8 4c 8b 1d 63 8a 08 0>
kernel: RSP: 0018:ffffb688f63dbc78 EFLAGS: 00010296
kernel: RAX: 0000000000000000 RBX: ffffb688f63dbc98 RCX: 0000000000000000
kernel: RDX: 0000000000000000 RSI: ffffb688f63dbc98 RDI: ffff90cbc2dd8000
kernel: RBP: ffffb688f63dbcd8 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 00000000d611b565 R11: ffffffffc20afcf0 R12: ffff90c6c9472008
kernel: R13: 0000000000000000 R14: ffffb688f63dbd77 R15: ffff90c6c9472008
kernel: FS:  00007fd3caa10ac0(0000) GS:ffff90dd5ef40000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f943d2bd000 CR3: 00000001336c4000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? __warn+0x18c/0x280
kernel:  ? nvkms_call_rm+0x5e/0xb0 [nvidia_modeset]
kernel:  ? report_cfi_failure+0x49/0x70
kernel:  ? nvkms_call_rm+0x52/0xb0 [nvidia_modeset]
kernel:  ? handle_cfi_failure+0x261/0x2d0
kernel:  ? rm_log_gpu_crash+0xe0/0xe0 [nvidia]
kernel:  ? handle_bug+0x4a/0x80
kernel:  ? exc_invalid_op+0x1a/0x50
kernel:  ? asm_exc_invalid_op+0x1a/0x20
kernel:  ? rm_log_gpu_crash+0xe0/0xe0 [nvidia]
kernel:  ? nvkms_call_rm+0x5e/0xb0 [nvidia_modeset]
kernel:  ? nvkms_call_rm+0x40/0xb0 [nvidia_modeset]
kernel:  _nv002607kms+0x42/0x50 [nvidia_modeset]
kernel:  _nv002301kms+0x52/0xb0 [nvidia_modeset]
kernel:  _nv000483kms+0x1c2/0x200 [nvidia_modeset]
kernel:  ? _nv000096kms+0x130/0x130 [nvidia_modeset]
kernel:  nvKmsIoctl+0xf9/0x270 [nvidia_modeset]
kernel:  nvkms_ioctl+0xf8/0x150 [nvidia_modeset]
kernel:  nvidia_frontend_unlocked_ioctl+0x68/0xa0 [nvidia]
kernel:  __x64_sys_ioctl+0x73/0xd0
kernel:  do_syscall_64+0x81/0xc0
kernel:  ? syscall_exit_to_user_mode+0x1a/0x50
kernel:  ? do_syscall_64+0x8d/0xc0
kernel:  ? do_syscall_64+0x8d/0xc0
kernel:  ? do_syscall_64+0x8d/0xc0
kernel:  ? do_syscall_64+0x8d/0xc0
kernel:  ? __irq_exit_rcu+0x71/0x160
kernel:  entry_SYSCALL_64_after_hwframe+0x4c/0xb6
kernel: RIP: 0033:0x7fd3ca71b5cb
kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 6>
kernel: RSP: 002b:00007fffc49fced0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 00000000c0106d00 RCX: 00007fd3ca71b5cb
kernel: RDX: 00007fffc49fcf30 RSI: 00000000c0106d00 RDI: 0000000000000016
kernel: RBP: 00007fffc49fcf30 R08: 0000000000000001 R09: 0000000000000001
kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000016
kernel: R13: 0000000000000000 R14: 00007fd3c9e13200 R15: 00007fffc49fcf90
kernel:  </TASK>
kernel: ---[ end trace 0000000000000000 ]---

sempervictus commented 8 months ago

To clarify, kCFI and RAP are the CFI schemes blowing up which work on different paradigms proving that the practices in use here violate security boundaries and force consumers (to include critical path systems, cloud vendors, etc) to operate at a reduced security posture to use the hardware. This is a result of nvidia hijacking kernel ABI by overwriting GPL functions with its own to skirt GPL compliance - a legal trick which attacks ring0 when loading the module via analog to the well known function stomping AT&K technique. This is not a bug, it is intentional behavior executed in a manner that MITRE has qualified as an attack mechanism. The driver code needs to be actually open source, no firmware blobs, no shims, no games - Zluda and others are already hijacking CUDA execution so the opacity play is now just a lose-lose as tools running on NV GPUs can be used to raoidly reverse/approximate their operating semantics and execute what was meant for them on cheaper (or more readily available) gear.

NVIDIA / open-gpu-kernel-modules