apache / buildstream

BuildStream, the software integration tool
https://buildstream.build/
Apache License 2.0
85 stars 30 forks source link

Lost file metadata in artifacts and images #38

Open BuildStream-Migration-Bot opened 3 years ago

BuildStream-Migration-Bot commented 3 years ago

See original issue on GitLab In GitLab by [Gitlab user @tristanvb] on Jun 12, 2017, 12:39

Problem

Currently, assuming a linux sandbox (we dont have others currently) everything is committed to the artifacts as:

When files are created with setuid/setgid in the sandbox, those should be recorded in the artifacts, but; when we checkout from ostree in user mode, the setuid/setgid bits are stripped.

This means that in addition to the above, when we create a bootable image, we pack in files that the sandbox sees without setuid/setgid bits.

Solution

Since recently, we now have a fuse subsystem in BuildStream which the sandbox uses, currently we only have one fuse layer which provides a copy-on-write hardlink experience (to solve issue #19).

To solve this I would like to take an approach similar to yocto's pseudo tool, but instead of doing any LD_PRELOAD, we would implement it with a fuse layer.

This will mean essentially the following:

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Oct 12, 2017, 09:56

mentioned in merge request !100

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Oct 28, 2017, 07:56

mentioned in merge request !128

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Oct 28, 2017, 08:11

Marking this critical because it prevents important things from being possible.

It's also a mild security concern because currently suid binaries are created on the real backing filesys, instead of suid-ness being virtualized inside the sandbox root (allowing others to gain access to the builder user's rights via a plausible build of sudo in an active sandbox).

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 8, 2017, 09:10

Some notes:

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Feb 13, 2018, 16:52

We need to consider #232 before expanding the use of FUSE to regular build jobs. I'm also worried about the impact on (parallel) build performance, especially with Pyfuse but potentially even with a FUSE layer in C. I expect that most builds do not need additional metadata and I wouldn't want to accept a noticeable slowdown (assuming tracking that metadata will be enabled by default).

We should consider possible alternatives to FUSE. Some aspects could likely be handled when analyzing the file metadata after the build (before/while committing it to the artifact cache), e.g., hardlinks, permission bits, and extended attributes. owner/group would be more problematic, however, static uid/gid numbers are anyway an issue and we might want to recommend a different solution to this.

Another possible alternative is to use seccomp together with ptrace to intercept certain syscalls. I expect this to be faster than FUSE as file operations such as read/write/mmap would not have to be intercepted at all, however, I don't have any numbers. It would definitely avoid #232. On the other hand it would be Linux-specific and likely require more effort to implement. We would also first need a proof of concept to check whether it's even feasible.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Feb 15, 2018, 18:12

FUSE running outside the sandbox can only ever handle UIDs from outside the sandbox. I.e., it cannot be used to record chown() calls from a sandboxed process for UIDs that have no mapping. According to my current understanding on Linux, supporting arbitrary UIDs either requires the use of subuids or a solution within the sandbox environment.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @samthursfield] on Feb 16, 2018, 16:47

mentioned in merge request !279

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Feb 26, 2018, 11:17

As some projects will likely use Squashfs, I'll mention here that mksquashfs supports pseudo files as possible workaround.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Mar 27, 2018, 09:40

A few weeks ago I performed a few test builds of GTK+ 3 as a simple benchmark for the FUSE overhead. Adding results here for future reference.

Filesystem parallel build -j32 notparallel build
ext4 without FUSE 2:37 (100%) 13:30 (100%)
SafeHardlinks (pyfuse) 13:45 (525%) 32:42 (242%)
bindfs (C libfuse) 6:42 (256%) 22:57 (170%)
BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Mar 27, 2018, 13:52

I've moved the aspect of preserving hardlink information to #324. The focus of this issue #38 is the permission metadata: UID, GID, lost permission bits such as setuid/setgid, and extended security attributes such as file capabilities and possibly ACL.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Apr 4, 2018, 15:55

Looking further into FUSE I've discovered that the tested FUSE filesystems don't make use of various optimization possibilities that FUSE supports. I then wrote my own minimal read-only FUSE filesystem to determine whether FUSE may be fast enough for regular builds despite the earlier disappointing tests.

That's now working and with this enabled, building GTK+ 3 on the same system takes 2:46, i.e., just 6% longer than without FUSE.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @toscalix] on May 15, 2018, 14:14

marked this issue as related to #324

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Jul 11, 2018, 12:28

The merge of the CAS artifact cache !337 will result in a small step backwards with regards to permissions. The only permission bit CAS supports is the executable bit for files. I.e., CAS will not store read/write permission bits for user/group/other, they are always 0644 or 0755. I don't expect major issues in practice given that we don't currently support setting UID/GID and thus the permission bits are of limited use. Non-world readable files and directories in /etc may be affected, though.

We will need to support all permission bits when addressing this issue, however, when we have a solution for UID/GID, setuid, and extended attributes, it should be trivial to also cover the missing permission bits.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Jul 27, 2018, 07:39

mentioned in merge request !570

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Jul 29, 2018, 19:10

mentioned in merge request !581

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @nanonyme] on Jan 20, 2019, 21:16

[Gitlab user @juergbi] wow, so is that the root cause for why currently in Flatpak any application can write to /etc files that are not shipped by runtime allowing arbitrary configurations unless we ship defaults under /etc?

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Jan 21, 2019, 09:15

Is there a Flatpak issue open for this with more information?

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @nanonyme] on Jan 21, 2019, 10:03

I was told on IRC this is most likely implementation detail of bubblewrap, not CAS. I'll create a bug into Flatpak anyway so this can be properly audited.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @abderrahimk] on Feb 1, 2019, 16:47

mentioned in merge request bst-external!69

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Jul 17, 2019, 10:10

marked this issue as related to #910

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Mar 16, 2020, 10:30

mentioned in merge request celduin/crash/jetbot-system-bst!36

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @abderrahimk] on Mar 23, 2020, 15:10

I think this doesn't necessarily need to be implemented all in one go. If the the UID/GID information is hard, we can start by implementing permissions, and possibly xattrs.

So what does this need exactly? Do we need to define an extension for CAS to store this information?

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @juergbi] on Mar 23, 2020, 15:30

I think this doesn't necessarily need to be implemented all in one go. If the the UID/GID information is hard, we can start by implementing permissions, and possibly xattrs.

A question is whether it's useful to store permission bits without UID/GID. There may be some cases where it would be useful but it could also be misleading in other cases.

So what does this need exactly? Do we need to define an extension for CAS to store this information?

The UnixMode node property can be used to store permission bits. No protocol extension is required (some details are expected to change, though: https://github.com/bazelbuild/remote-apis/pull/128), however, support needs to be added to BuildBox components and BuildStream.

It should most likely be optional given that it may have a performance impact (when hardlinking) and especially as it's incomplete for now (no UID/GID). An option for this would probably best fit into the sandbox config.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @willsalmon] on Mar 24, 2020, 08:43

The way I understand it is that we need a mechanism to deal with meta data, this should be very similar to how we already deal with mtimes. And then we need some configuration options, as like [Gitlab user @juergbi] say we already store some meta so this is very important for some tasks that need extra but not at all for others were we already store enough meta, so if it can be behind configuration then that would be great.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on May 28, 2020, 12:58

This is not a blocker item for BuildStream 2.0, removing from the milestone.

As discussed in the BuildStream 2.0 planning announcement (See the "Sandboxing" section), it is intended to work as follows:

These extensions can be added over time without breaking API, as such BuildStream 2.0 does not block on this.

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @tristanvb] on Nov 4, 2020, 10:49

marked this issue as related to #429

BuildStream-Migration-Bot commented 3 years ago

In GitLab by [Gitlab user @BenjaminSchubert] on Dec 4, 2020, 12:46

marked this issue as related to #1413

jjardon commented 2 years ago

buildbox-casd issue related with this: https://gitlab.com/BuildGrid/buildbox/buildbox-casd/-/issues/69

jjardon commented 2 years ago

And the downstream issue in freedesktop-sdk: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1446 (a plugin is being used to workaround this limitation)

doraskayo commented 2 years ago

I'd also add that bst artifact checkout should be able to preserve file permissions in the resulting target directory, including the SUID bit. Of course, this would likely require the command to be executed in a priviledged context.

juergbi commented 2 years ago

buildbox-casd and buildbox-fuse now support staging files with custom mode bits. See https://gitlab.com/BuildGrid/buildbox/buildbox-casd/-/issues/69#note_1127273556 for limitations of the current support.

I'd also add that bst artifact checkout should be able to preserve file permissions in the resulting target directory, including the SUID bit. Of course, this would likely require the command to be executed in a priviledged context.

Setting the SUID bit doesn't require any special privileges as the file owner. Privileges are required to change the file owner, though.

nanonyme commented 5 months ago

Needs design for sandbox (?) configuration changes. Quote from @juergbi from Matrix

We should probably think about the config structure for possible future additions as well (e.g. xattr, owner uid/gid, file capabilities).
There could potentially be support for different strategies. E.g. for setuid a whitelist co
jjardon commented 2 months ago

buildbox issue: https://gitlab.com/BuildGrid/buildbox/buildbox/-/issues/104

nanonyme commented 2 months ago

@jjardon @juergbi explained to me in theory the buildbox part exists. BuildStream needs to be taught to tell buildbox to capture these extra things. Beginnings of a proposal are in https://github.com/apache/buildstream/pull/1953 but my disk broke down before I got to actually test it. If the metadata is captured, it will be staged automatically.

We weren't entirely certain whether setuid needs special whitelist or not. If we assume unixmodes are only captured in composes (where freedesktop-sdk is really using them), then we have more control over setuid as it can be set through integration commands.

nanonyme commented 2 months ago

Note that file ownership should be regarded a stretch goal as it requires complex uid+gid remapping.

abderrahim commented 2 months ago

FWIW, I gave this a go some time ago. I can try to dig it up.

Requesting that buildbox captures the unix mode is not enough, as buildstream will manipulate the outputs (e.g. to extract artifacts and buildtrees separately) and for this buildstream's CasBasedDirectory also needs to be aware of the unix_mode property and propagate it accordingly.

Even with my changes I couldn't get the extended unix mode to be staged. I might have missed some part that needs to propagate them.

nanonyme commented 2 months ago

Please do, let's link all work back here.