bottlerocket-os / twoliter

A build tool for customizing Bottlerocket
Other
24 stars 25 forks source link

Twoliter's build of krane appears to be linking against libc #403

Closed cbgbt closed 2 weeks ago

cbgbt commented 1 month ago

Twoliter fails to interact with OCI images on hosts with a less-recent libc than our cross-build environment (defined to some degree here. The build of krane occurs here).

When the error occurs, it looks like this:

[2024-10-21T13:16:37Z INFO  twoliter::project::lock::image] Resolving dependency image dependency 'bottlerocket-core-kit-3.0.0@public.ecr.aws/bottlerocket/bottlerocket-core-kit:v3.0.0'.
Error: Failed to run operation with image tool:
 command: /proc/9302/fd/9 manifest public.ecr.aws/bottlerocket/bottlerocket-core-kit:v3.0.0

We need to:

cbgbt commented 1 month ago

It would be great if we could combo this with https://github.com/bottlerocket-os/twoliter/issues/398 to help us test more easily, or even provide a Makefile target to execute cross builds locally so that we can test more easily.

sam-berning commented 3 weeks ago

I hacked together a branch of twoliter that builds krane statically and ran that on my machine that has this issue, and it's still hitting the same problem. So I'm not sure this is a linking issue.

I also tried to improve the error message by including stdout and the exit status as well, but it's still not very helpful:

[2024-10-28T22:44:08Z INFO  twoliter::project::lock::image] Resolving dependency image dependency 'bottlerocket-core-kit-3.0.0@public.ecr.aws/bottlerocket/bottlerocket-core-kit:v3.0.0'.
Error: Failed to run operation with image tool: status: signal: 9 (SIGKILL) stderr:  stdout: 
 command: /proc/2573/fd/9 manifest public.ecr.aws/bottlerocket/bottlerocket-core-kit:v3.0.0

The only bit of info we get from that is that the process exited due to a SIGKILL from somewhere, but that could be a number of things. I'll keep looking into it.

cbgbt commented 3 weeks ago

This may be due to our use of pentacle to create sealed anonymous files on Linux. I wonder if the kernel headers present at build time may influence us here.

Due to this comment some of the sealing behavior is kernel version dependent, though pentacle should be resilient to missing features. Here's where the seals are added. You could try running this with log level set to trace to see if you get anything from this function call.

sam-berning commented 3 weeks ago

Yeah, it looks like this might be related. If I strace the twoliter update command on a system that it works on, the F_ADD_SEALS syscalls look like this:

fcntl(9, F_ADD_SEALS, F_SEAL_EXEC)      = 0
fcntl(9, F_ADD_SEALS, 0)                = 0
fcntl(9, F_ADD_SEALS, F_SEAL_SEAL|F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE) = 0
fcntl(9, F_GET_SEALS)                   = 0x3f (seals F_SEAL_SEAL|F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE|F_SEAL_EXEC)

compared to a system that twoliter update fails on:

fcntl(9, F_ADD_SEALS, 0x20 /* F_SEAL_??? */) = -1 EINVAL (Invalid argument)
fcntl(9, F_ADD_SEALS, 0)                = 0
fcntl(9, F_ADD_SEALS, F_SEAL_SEAL|F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE) = 0
fcntl(9, F_GET_SEALS)                   = 0xf (seals F_SEAL_SEAL|F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE)

So the binary never gets sealed with F_SEAL_EXEC, which means that the exec bits can be changed.

sam-berning commented 3 weeks ago

I've seen twoliter update fail at different points in the process (sometimes on the first krane manifest, sometimes it succeeds krane manifest but fails at krane config), but always exiting with SIGKILL.