facebook / buck2

Build system, successor to Buck
https://buck2.build/
Apache License 2.0
3.39k stars 203 forks source link

Change default directory #325

Open kingiler opened 1 year ago

kingiler commented 1 year ago

It would be great if it is possible to change the default directory. As my home directory is write protected, buck2 fails to execute. An investigation shows that buck2 attempts to create ~/.buck2.

One solution addressing this is an environment variable to change the default directory. Another potential solution is to follow the XDG Specification.

thoughtpolice commented 1 year ago

This directory is used for lock and state files for the daemon, I think. $XDG_STATE_DIR is in some sense good for this, but I don't think these files need reboot-to-reboot persistence and only matter while a buck daemon is running. So it would probably be best to just use $XDG_RUNTIME_DIR for this on Linux, since it's readily available, and those files will be put in /run and cleaned on reboot, so it helps avoid the whole pile of mess under $HOME/.config, etc.

I don't necessarily think anything should change for Windows or macOS, as they have their own logic for where applications should write temporary files. Maybe there's a Rust crate that can abstract them all...

stepancheg commented 1 year ago

my home directory is write protected

This is not very common setup.

XDG_RUNTIME_DIR

This will lead to problems.

For example, we could use this env var if it is set, and fall back to home directory if it is unset.

Then it is likely that we will encounter a situation when we have two lock dirs, for example, one when buck2 is launched from a terminal from GUI, and different one when we logged in to the machine via SSH or changed a user with su/sudo — these rarely handle these cases consistently.

The easiest workaround is probably just to create ~/.buck2 manually.

If the error message is not helpful, we should fix the error message.

thoughtpolice commented 1 year ago

The inability to write directly into the raw homedir is not actually that unusual and will probably become more common I'm afraid thanks to the rise of things like systemd-homed, immutable distros; various sandboxes, etc. Not to mention netboot or shared NFS homedir scenarios where lock semantics get weird. Ultimately relying on $HOME to be a local writeable disk is just as arbitrary as any other directory, it only has a longer history and it's "expected" to be +w. But there are a lot of reasons to not want that at all.

More broadly, in general, the modern consensus on stuff like this is — from my personal view — that blind access to anything under $HOME for any application under your uid is a massive mistake and one of the original sins imposed on us by the Unix permission model. There is a very big desire in much of the Linux community to move away from that; I would not expect this trend to go away, but rather grow. Many people today will simply flat out say that buck2 is not special or unique enough to warrant the ability write in the same place they keep their GPG keys, their SSH keys, their s3 configuration variables, their browser data and keychains, etc. Screwing around with stuff $HOME should be treated with more care these days. And you can be sure you'd even be denied +r access too, if it was reasonably possible today.

Regarding the other bits: XDG_RUNTIME_DIR should be handled correctly (at minimum) and consistently in both SSH sessions and in desktop sessions as long as you're using logind appropriately, and the raw framebuffer too if you have access; even in non-systemd distros, they tend to use forks like elogind which do the same thing. If they did not, less and less software would work correctly. Many daemons would not work correctly if this was not the case e.g. because a user wouldn't be able to find the right socket file when a client application talks to a server, just like buck wants to. This is common — I am literally not sure if several systemd components could even function correctly without this? All KDE and GNOME desktop software is standardized around this too, and it is expected behavior for applications to be have consistent login environments, over SSH or not. You wouldn't get very far without it. The user changing their identity su/sudo case is, I think, not convincing, because it already has a ton of edge cases anyway — for example they could actually sudo -i in which case their $HOME is different and the whole conversation just starts over. I never do anything but use sudo -i today because not inheriting an interactive environment is a great way for you to stomp all over permission bits of files accidentally, for example. It's a very fragile scenario that can be broken in ways to support either argument; I don't think that's a good basis for not doing this. The case where people su/sudo with an improper environment setup and it not getting weird is... I can't think of a single client-daemon application that does this and is expected to not behave weirdly. I might be misunderstanding your example, though...

I don't think relying on one of the few cross-distro standard layout decisions is fragile at all. If anything it's much better to say "Get your XDG_RUNTIME_DIR setup appropriately and buck2 will use it." This could even be as simple as a single line in .bashrc for those people who have completely ridiculous and austere bespoke artisanal Linux systems.

I do not believe this complaint will go away, and more people will ask about it in the future.

thoughtpolice commented 1 year ago

If anything it's much better to say "Get your XDG_RUNTIME_DIR setup appropriately and buck2 will use it."

Just to be clear here, I mean we should literally just mandate that they have XDG_RUNTIME_DIR set on Linux. At best, falling back to TMPDIR is I guess acceptable, but I literally can't think of a single modern distro that hasn't hopped on this train. Fail loudly if it is not set. I don't think this is burdensome at all and is for all intents and purposes a completely transparent and cross-distro modern Linux standard (the few that exist).

stepancheg commented 1 year ago

I literally can't think of a single modern distro that hasn't hopped on this train

I googled "XDG_RUNTIME_DIR not set", it is a lot of people reporting. Assuming it is always set is optimistic.

Cargo and most (all?) other build systems does not use these variables, and write the state into $HOME.

I also checked $XDG_RUNTIME_DIR on a server I work with daily, it has only one state file which is not created by systemd. $XDG_STATE_DIR is not even set.

But not to go to far to illustrate how large is the issue, even buck2 does not know about this environment variable and does not propagate it to tests.

https://github.com/facebook/buck2/blob/8cddbf1bb4f7cf159ece1221316aeb4f2404d750/app/buck2_execute/src/execute/environment_inheritance.rs#L19

We can fix this case for buck2, but any other tooling which may invoke buck2 (IDEs, CIs, automation, docker distros) may also forget to set/propagate this variable. It will be a lot complaints about buck2 just not working.

And defaulting to $HOME is not an option, because it will result in two instances of buck2 running, which is definitely not what users expect.

thoughtpolice commented 1 year ago

I literally can't think of a single modern distro that hasn't hopped on this train

I googled "XDG_RUNTIME_DIR not set", it is a lot of people reporting. Assuming it is always set is optimistic.

Cargo and most (all?) other build systems does not use these variables, and write the state into $HOME.

I think that's also bad to be fair, and I don't buy Cargo as the standard to go by for a lot of things.[^1] ;) And if I'm being blunt I don't think "random linux user on internet reporting something weird" is frankly unusual. It's a "dog bites man" story. Not a breaking-news-at-11pm story.

[^1]: To its credit, Cargo does let you change those directories if they don't work, for good reason! They would never work in a Nix sandbox for example (which is an empty filesystem with only a /build directory), but we of course use Cargo fine, because we're able to control those paths.

To be fair, you can say that this ticket is also a case for that. But this is something that there's generally been some effort to standardize on across many systems and many pieces of software. It isn't just random arbitrary stuff, I think there's generally a lot of motion to stop putting things directly in $HOME and putting them at minimum somewhere like $HOME/.config (more on that in a second!) in the Linux ecosystem.

I also checked $XDG_RUNTIME_DIR on a server I work with daily, it has only one state file which is not created by systemd.

I at least have +5 (gnupg, snapd, vscode, wayland, pulseaudio, dbus, etc) but to be fair, that's because the XDG spec and a lot of related software come out of freedesktop.org so it would be kind of embarassing not to. But literally all of "modern" Desktop Linux relies on this, I can assure you. (If you think Linux Desktop is meaningless I also get that, I run Windows mainly!)

FWIW, this is also in use for pretty much any systemd --user daemon that runs in the user profile and uses things like CacheDir=, StateDir=, etc. It's very well worn on modern Linux.

$XDG_STATE_DIR is not even set.

That was a typo admittedly, you actually want XDG_STATE_HOME, but even that has a particular specific key wording in the standard that is pretty critical. I'll quote the first three lines here, highlights by me:

$XDG_DATA_HOME defines the base directory relative to which user-specific data files should be stored. If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.

$XDG_CONFIG_HOME defines the base directory relative to which user-specific configuration files should be stored. If $XDG_CONFIG_HOME is either not set or empty, a default equal to $HOME/.config should be used.

$XDG_STATE_HOME defines the base directory relative to which user-specific state files should be stored. If $XDG_STATE_HOME is either not set or empty, a default equal to $HOME/.local/state should be used.

So, to be clear this advises that there are alternative paths to be used in case these are unset! That's my bad for not putting that up front, though, I should have done that instead of rambling, if that actually is the main concern :P

But not to go to far to illustrate how large is the issue, even buck2 does not know about this environment variable and does not propagate it to tests.

XDG_RUNTIME_DIR is specifically for intra-application IPC purposes for things like sockets and named pipes, under mode 700. This is intended for "asynchronous" applications that perform some form of IPC more or less (whether or not they are long lived, really, but most of the time, like 98%, it's a daemon.) The section specifically advises it for this use case and for non-heavyweight purposes, because it's generally a namespaced tmpfs mount that can't even be seen by other users. So unless you're actually building, running, and testing daemons, I don't think any of them would use it anyway.

Similarly, all of these other variables have default fallbacks, which is generally what they're intended to do. For example, the xdg package will do this in Rust. So they wouldn't need it to be set at all for those things to work correctly as long as they had permissions to write to $HOME/ anyway (if they couldn't it would already be a problem!)

Just some stats from my homedir:

austin@GANON:~/src/buck2.jj$ ls -a ~/ | grep "^\." | wc -l
51
austin@GANON:~/src/buck2.jj$ ls ~/.config/ | wc -l
29
austin@GANON:~/src/buck2.jj$ ls ~/.local/share/
applications  direnv  efinity  fish  flatpak  keyrings  meld  nano  nix  recently-used.xbel  streampager  zoxide
austin@GANON:~/src/buck2.jj$ ls ~/.local/state/
nix  rancher-desktop
austin@GANON:~/src/buck2.jj$ ls ~/.local/share/applications/
mimeapps.list
austin@GANON:~/src/buck2.jj$ ls ~/.config/
LibrePCB        broot  buck2-nix-preview.nix-key  enchant  gedit  ghc            gtk-3.0  htop  kdiff3rc  mgba           nushell  pulse   renode   starship.toml  watchman
QtProject.conf  btop   dconf                      fish     gh     google-chrome  helix    jgit  log       mimeapps.list  pgcli    racket  sapling  test
austin@GANON:~/src/buck2.jj$

So I don't know if that's really convincing or not, but we at least have Chrome, dconf, racket, sapling(!), gtk-3.0, etc etc.

We can fix this case for buck2, but any other tooling which may invoke buck2 (IDEs, CIs, automation, docker distros) may also forget to set/propagate this variable. It will be a lot complaints about buck2 just not working.

The other cases don't matter so much for the previous reason I gave — you just default to the other spots recommended by the XDG spec. So all of those cases will work just fine — well, you'll at least need $HOME set, of course! I think Docker is the only "maybe" case there but to be frank tons of software needs tweaks to handle stuff like completely empty env and empty FHS, etc. buck2 isn't really going to be some crazy special out there thing.

The IDE case is a bit more interesting; you don't necessarily control the integration path, because they may want to invoke buck2 in... fun ways; like it might cd around behind your back. That could be bad in some edge cases. But in that case why wouldn't buck2 just offer a way — any way, CLI flag, BXL API, whatever — to get this information anyway to the client application, so it isn't a problem? The IDE runs some hypothetical buck2 get-daemon-dir or something from the project root — maybe it would explode if the daemon wasn't already there, or start it if it was — and then just prints it out on stdout. The IDE remembers this, then just uses that information for the rest of its lifecycle? Is this not possible? I mean it's completely hypothetical, I'm just spitballing that this is definitely not an unsolvable issue.


To be clear, I don't think anything needs to change on macOS or Windows at all, assuming that buck is correctly abiding by the operating system recommendations for temporary/runtime files. And if it isn't? Then we should change that too. The standards are different, but I think it's more important buck2 follow them than try to say it's so so special and unique it can't do that. The Linux world I think really is making ground here despite some real feet-dragging on other things; I think we should respect the XDG spec. It's the best thing we've got, really.

steveklabnik commented 12 months ago

Maybe there's a Rust crate that can abstract them all...

cries in open source maintainer

https://poignardazur.github.io/2023/05/23/platform-compliance-in-cargo/

that said, I have used https://crates.io/crates/directories in the past, which should help. But like, the post I linked above is the most comprehensive investigation into this topic that I know of.

rbtcollins commented 7 months ago

I think this is one of the things to do early rather than later. Right now the cost is that some early adopters of a new thing with basically zero% of the TAM converted, might have a higher adoption cost.

Later, the problem is - well the situation we're in with rustup and cargo. Where even maintainers of those tools (e.g. me, for rustup), who do want to do the right thing, don't consider it shallow enough to solve without taking a sabbatical or some such.

FWIW I don't individually care about extra files in ~, my ~ has files from the mid nineties, there's no way that directory is ever going to be clean. But I do appreciate new pieces of software that follow the standards, its might easier to reason about.

And thats perhaps the strongest argument I know of: the existence of an uncategorised ~/.buck2 will eventually lead to something that isn't runtime but is instead config getting mixed in. And then you have a real problem not a maybe-problem.

ndmitchell commented 6 months ago

@JakobDegen - this might be one you want to look at.