Open nalimilan opened 11 years ago
This is a nice and thorough proposal.
Yes, this is a very well-presented proposal. However, we're not going to do this. We only just got .julia
in the same place on all systems (Windows was the outlier before). It would be crazy to start doing some different idiosyncratic thing on different systems. This standard doesn't even work on all Linuxes, let alone other UNIXes like OS X, FreeBSD and of course Windows. The only thing that is truly universal is putting a bunch of files in a directory in the user's home directory.
Thanks for the kind words. ;-)
I understand that changing the path to the config directory is not a nice task. That's why it took ten years and all programs haven't moved yet. But what I'm proposing merely amounts to changing the path from ~/.julia to ~/.config/julia. On 90% of systems it will be that way; only a few UNIX hackers will change the path to XDG_CONFIG_HOME, and they know what they are doing. There's no issue of the standard "working" or not: either the variable is defined, and you use it, or it's not, and you use your default path; in both cases you end up with the same result most of the time: ~/.config. This is my definition of "putting a bunch of files in a directory in the user's home directory". So I don't think this introduces a real inconsistency between OSes.
BTW, it's false that FreeBSD does not support this: see http://www.freebsd.org/cgi/query-pr.cgi?pr=133825 I also find it unlikely that any Linux distribution fails to support this, as almost all free desktop environments have made the switch for some time. GLib for example provides functions that default to ~/.config if XDG_CONFIG_HOME is not set, and it's used by most GNOME and XFCE apps now.
Julia is still young, so it's not too late to switch. Big and popular cross-platform applications have made or are making the change: Chromium [1], Firefox (although not without debate [2]), VLC[3], LibreOffice, GIMP [4].
Disclaimer: though I'm member of the GNOME Foundation, I have no shares in Freedesktop.org. :-)
1: https://chromiumcodereview.appspot.com/12386077/ 2: https://bugzilla.mozilla.org/show_bug.cgi?id=259356#c81 3: https://trac.videolan.org/vlc/ticket/1267 4: https://mail.gnome.org/archives/gimp-developer-list/2012-October/msg00037.html
Having read through the spec, I'm not sold on this at all. I've got many issues with it.
.julia
directory, creating and developing packages. Moving it to some path like ~/.local/share/julia
is a major pain, not to mention being pretty clearly semantically wrong – again, installed packages are neither configuration nor user data.~/.config/julia
, ~/.local/share/julia
, /etc/xdg/julia
, etc.). If there was some environment variable that allowed people to opt into this – say XDG=true
, or something – and this was set by default on standard-conforming systems, then I might consider supporting that, but that's not how the spec works. It assumes everyone has bought into this wholesale, rudely foisting long annoying paths on people who don't give a crap about XDG and just want their Julia stuff in .julia
.Given all these issues and the fact that this causes a fairly major coding headache and fractures the IMO priceless uniformity of our cross-platform behavior, I'm sticking with my original gut reaction, which is to say no.
I realize you mentioned that git "conforms" although it isn't a desktop program. Except that it doesn't really conform – git will look in the XDG locations if they exist, but if it needs to create a config file, it still creates ~/.gitconfig
.
It is ok for Julia to not follow XDG, however it is a nice thing to allow users to change the default directories used by Julia from the configuration file, or during julia source compilation.
like go
, julia
can have a ~/.config/julia/env
for storing environment variables. so we can choose which directory to use.
Quoting Fredrik in https://github.com/JuliaLang/julia/issues/37503#issuecomment-690036262:
You can set
JULIA_DEPOT_PATH
of you are not happy with~/.julia
.
I was kind of harsh in this issue before, but I still think that consistently putting all Julia-related stuff in ~/.julia
by default was the right call. The XDG "spec" (if such a tiny, half-baked document can be called a spec) still has all of the issues I called out in 2013. Which is not too surprising since no one has touched it since it was scribbled on the virtual equivalent of the back of a napkin a decade ago.
The "spec" is terribly UNIX-specific and only makes sense on multiuser UNIX systems, which are no longer the common case: these days Linux and macOS dev laptops are the norm and the vast majority of servers are virtualized/containerized single-user systems. This spec really only makes sense on multiuser UNIX systems at research institutions. For the vast majority of users, the complexity of scattering everything around directories that aren't next to each other doesn't gain you anything since those directories are on the same file system on your computer's one and only hard disk. All it does it make it harder to see what's going on and add unnecessary complexity to Julia's Pkg and code-loading logic that works with those directories (which is already too complex).
For those users that do actually work on a multi-user UNIX system where it makes sense to put config, cache and runtime data on different filesystems, it would be simple to create symlinks for the fixed set of subdirectories stored under ~/.julia
to the various XDG directories as they feel appropriate. Such a tool could even be open sourced and maintained by people who have need for it! For reference, the directories that are saved under ~/.julia
are listed in the glossary entry for "depots" and include:
artifacts
: content-addressed data and binary dependenciesenvironments
: shared named environments (e.g. v1.0
, devtools
)clones
: bare clones of package repositoriescompiled
: cached compiled package images (.ji
files)config
: global configuration files (e.g. startup.jl
)dev
: default directory for package developmentlogs
: log files (e.g. manifest_usage.toml
, repl_history.jl
)packages
: installed package versionsregistries
: clones of registries (e.g. General
)I'm not entirely sure how to map these onto XDG locations, which I feel is due to lack of clarity of the "spec" since I do understand what these directories are for but the spec doesn't really give clear criteria for what should go where, it just lists a bunch of technical properties of locations by which you're apparently supposed to infer, together with suggestive names, where to put things? I feel like people who want to use the spec can figure this out.
For people who work on a single-user system where all the XDG directories are on the same file system but think that it's distasteful for Julia to clutter up their home directory with it's one ~/.julia
directory, it should be sufficient to change JULIA_DEPOT_PATH
to point somewhere else.
it should be sufficient to change JULIA_DEPOT_PATH to point somewhere else.
Indeed it is nice that we can set a custom path for julia files. Going a bit further would help: currently users can't separate their configuration from the generated like registry clones. For users to be able to apply the spec to their own systems, it would be helpful if the JULIA_DEPOT_PATH
could be split up into more than one variable, so I can say JULIA_CONFIG_DIR
goes to ~/.config/julia
, JULIA_CACHE_DIR
goes to ~/.cache/julia
, JULIA_DATA_DIR
goes to ~/.local/share/julia
.
For those users that do actually work on a multi-user UNIX system where it makes sense to put config, cache and runtime data on different filesystems, it would be simple to create symlinks for the fixed set of subdirectories stored under
~/.julia
to the various XDG directories as they feel appropriate. Such a tool could even be open sourced and maintained by people who have need for it!
It's unlikely that this will be worked on further in Julia itself. A third-party open source tool sounds like the right answer.
I missed an important post from @StefanKarpinski in the other thread https://github.com/JuliaLang/julia/issues/10016#issuecomment-370254878:
The "dotfile bloat" part of this has been addressed, so I'm closing. If someone wants to implement the XDG stuff, they're welcome to make a PR.
So it seems it's on those of us who are interested in XDG to write the code but that @StefanKarpinski wouldn't be opposed to it.
If someone wants to work on this, it's fine, but there are questions that needs answers. So far the "proposal" here is "Julia should follow the XDG spec". Ok, suppose we were to do that. What, concretely, does that mean?
First and foremost, how do each of the directories I listed in https://github.com/JuliaLang/julia/issues/4630#issuecomment-690349632 map to XDG locations? Someone who has an interest in this needs to spend some time with the XDG spec and classify each of these and how they should relate to all the XDG environment variables. There are a lot of them: XDG_DATA_HOME
, XDG_CONFIG_HOME
, XDG_CACHE_HOME
, XDG_DATA_DIRS
, XDG_CONFIG_DIRS
, XDG_RUNTIME_DIR
. For each subdirectory of ~/.julia
, they need to determine where, based on the values of all of these XDG environment variables, should that subdirectory go? This is a very concrete question: there should be an expression for each subdirectory I listed that computes a path location based on the values of XDG variables. Without this, no progress can be made.
The next step would be to evaluate whether it makes sense to put this logic into Base and Pkg or not. My guess is that it is not worth the significant added complexity to both Base and Pkg, but if someone can refactor the Base and Pkg code so that it abstract over the XDG stuff cleanly, then a case could be made for it.
An alternative is to have a utility package that goes through all the paths above and makes sure that ~/.julia/packages
, for example, is a symlink to the place that XDG dictates that Julia packages should be stored. If the directory already exists, the tool would first move the contents of that directory to the XDG-specified location and then create the appropriate symlink from ~/.julia/packages
to the XDG location. That way ~/.julia
always exists and lets you easily see all your Julia-related data, but that data can be stored in XDG-compliant locations. If the XDG variables never change, then this symlink setup only needs to be run once, but if you anticipate changing them often, the tool could be called from ~/.julia/config/startup.jl
so that it gets run on each Julia startup. It should probably try to do some kind of advisory locking (see Pidfiles) so that if multiple instances of the tool are run at the same time, they don't try to do this simultaneously.
I'd say, I don't think any of the directories belong in RUNTIME. config goes in CONFIG. Otherwise decide between DATA and CACHE based on the answer to "can the file can be created from scratch with no observable difference in behavior other than the waiting time to repopulate the cache?".
As a first pass, how about:
Sources:
That's a good start. I would put packages
and artifacts
in the CACHE category: they are immutable and content addressed and should be completely reconstructions from the upstream servers.
I have reopened the issue as there's active interest in working on it apparently.
I've been following this issue for some time, and I'm under the impression that beyond "some other applications follow the spec" the arguments for doing so, and why Julia should care, have not been spelled out very explicitly. Thus, I'd like to explain why this is important to me.
For context, I'm only a very light user of Julia, but still .julia
uses substantial storage for binary artifacts. Currently, this folder amounts to roughly 1GB:
$ du -sh .julia/*
573M .julia/artifacts
65M .julia/compiled
0 .julia/dev
36K .julia/environments
224K .julia/logs
160M .julia/packages
211M .julia/registries
and I suspect that this could grow significantly when using more packages. I don't care very much about the aesthetics of separating things into .config
, .data
etc. However, I'd appreciate if Julia made an effort to classify cached data. No application should use lots of storage too lightly, and not everyone can afford to just buy more disks when running low. This is particularly important due to two details of my setup:
I'm backing up the entirety of $HOME
with some manually excluded directories; these (incremental) backups are retained for years. To keep backup storage in bounds, it is important to exclude non-essential, possibly volatile data. Backing up the entirety of $HOME
is necessary precisely because there is no centralized location for user data and configuration.
I'm using snapper on the btrfs filesystem to have Time-Machine-like snapshots (by the way, how does Julia behave on "macOS dev laptops"? It seems that applications need to opt-out of backups there, too: https://github.com/rust-lang/cargo/issues/3884). Again, these should not include large cache data (in particular if it changes frequently). Disk usage by snapper is not as critical as for the other backups in my case since snapshots are retained for shorter timespans.
In both cases, there's an easy solution if cached data is centralized in .cache
(a single exclude pattern and converting .cache
to a btrfs subvolume, respectively). If it's not, manual work is required which doesn't scale at all to many applications. In addition, adding yet another exclude pattern will usually be late because I have to notice first that one more space hog showed up. This is an even larger issue if backup archives are immutable.
Of course, this issue is by no means a problem unique to Julia, many (cross-platform) applications have similar behaviour. As a side note, I think appealing to the "non-desktopness of Julia" is missing the point: Julia is used on Linux desktop systems. I do agree that the XDG spec is lacking, in particular that packages
and artifacts
are neither clearly data or cache. However, I don't think that this should preclude at least partially respecting the spec.
There is one more feature I'd like to propose, which might also be a stopgap solution to the request for honoring the XDG spec: There's another non-universally respected specification for tagging cache directories using a CACHEDIR.TAG
file. Some backup tools (such as borg have a setting to respect these.
As I've mentioned, I'm not using Julia very much, so I'm not going to work on this myself. I'd like to thank anyone who does in advance!
Another solution, more flexible but less automatic, is to support JULIA_CONFIG_PATH
, JULIA_ARTIFACTS_PATH
... environment variables the same way there is JULIA_DEPOT_PATH
. If the environment variable is unset, we keep the actual location. That way :
Can't you do all of that with symlinks?
Ok, I can't resist the urge to chime in here. There seems to be a fair bit of FUD spread about XDG, its relevance to Julia, and the Linux community's attitude towards it in this issue and #10016.
For starters, I'd like to rehash the case for Julia not ignoring XDG.
${XDG_CACHE_HOME:-$HOME/.cache}/julia
"user-specific non-essential data files"
~/.julia/clones
~/.julia/compiled
~/.julia/registries
${XDG_DATA_HOME:-$HOME/.local/share}/julia
"user-specific data files"
~/.julia/artifacts
~/.julia/dev
~/.julia/packages
${XDG_STATE_HOME:-$HOME/.local/state}/julia
"user-specific state files"
~/.julia/logs
${XDG_CONFIG_HOME:-$HOME/.config}/julia
"user-specific configuration files"
~/.julia/config
~/.julia/environments
(I'd argue that user-created environments are an aspect of user configuration)Now, onto some of the FUD.
This is a spec for Linux desktop programs (KDE, Gnome, etc.). Julia does not fit the bill.
The freedesktop spec is not for "desktop as in GUI" it is for desktop as in "the Linux desktop", i.e. a computer I will sit in front of and use, and the programs on it. The fact that "graphical" and "desktop" are separate is made clear on the https://www.freedesktop.org homepage where the first sentence mentions "graphical and desktop systems".
Julia is a desktop program.
Git has previously been mentioned as an example of an XDG-conforming non-graphical program, and there's a large collection of others: alsa, curl, gdb, gnuplot, htop, less, python's pip and poetry, wireshark, ...
This standard doesn't even work on all Linuxes, let alone other UNIXes like OS X, FreeBSD and of course Windows. The only thing that is truly universal is putting a bunch of files in a directory in the user's home directory.
It's already been mentioned that this does work on all Linuxes and FreeBSD. This concept does also apply to Mac and Windows. See this table taken from https://github.com/OpenPeeDeeP/xdg
Linux(and BSD) | Mac | Windows | |
---|---|---|---|
XDG_DATA_HOME |
~/.local/share |
~/Library/Application Support |
%APPDATA% |
XDG_CONFIG_HOME |
~/.config |
~/Library/Application Support |
%APPDATA% |
XDG_CACHE_HOME |
~/.cache |
~/Library/Caches |
%LOCALAPPDATA% |
I have to say that I doubt that the typical Linux user would actually prefer this complex fractured layout to just having everything Julia-related under ~/.julia with the appropriate names.
This sounded very dodgy to me, so I hopped onto a nearby Linux server and asked. The results: 11 :+1: in favour of XDG, 0 :-1: against, and some extra comments like "xdg pls, can't believe devs are even considering otherwise" and "the typical Linux user will install more than just Julia.".
We can also gauge Linux + Julia users' thoughts by looking at the reactions to this issue and comments.
.julia
.julia
Sure, there's going to be some overlap, but if you tally these reactions you get 56 reactions in favour of XDG and 3 against.
I think we can safely conclude that the Linux + Julia community overwhelmingly wants XDG compliance.
This spec really only makes sense on multiuser UNIX systems at research institutions. For the vast majority of users, the complexity of scattering everything around directories that aren't next to each other doesn't gain you anything since those directories are on the same file system on your computer's one and only hard disk.
For starters, all modern *nix systems (Linux, BSD, Mac) are multi-user. Furthermore, while XDG spec may be good for multi-user + research systems, it also makes a lot of sense for single-person desktop setups too. @wisp3rwind has raised some points, but I also think an experience of mine may serves as an illustrative counter-example. I run an automated backup system on my computer, as I've been burned too many times by data loss/accidental deletions. To keep the size of my backups manageable, I blacklist certain directories that I know aren't that important. Thanks to XDG, I am able to eliminate a lot of unnecessary wastage by blacklisting ~/.cache
and ~/.local/share
. A few months after installing Julia I noticed that my backup sizes had rapidly ballooned by ~30-40 GB. The cause? ~/.julia/artifacts
primarily. I've now had to add special rules for ~/.julia
.
If only Julia didn't seem set on reminding us why the XDG spec was created in the first place...
Can't you do all of that with symlinks?
To an extent, yes. In fact, here's a little script that makes Julia respect the XDG spec (untested, but should work).
echo 'export JULIA_DEPOT_PATH=${XDG_DATA_HOME:-$HOME/.local/share}/julia' >> ~/.{bash,zsh,fish}env
mkdir -p ${XDG_CONFIG_HOME:-$HOME/.config}/julia
mkdir -p ~/.julia/config
mv ~/.julia/{config,environments} ${XDG_CONFIG_HOME:-$HOME/.config}/julia
ln -s ${XDG_CONFIG_HOME:-$HOME/.config}/julia/* ${XDG_DATA_HOME:-$HOME/.local/share}/julia
mkdir -p ${XDG_DATA_HOME:-$HOME/.local/share}/julia
mv ~/.julia/{artifacts,dev,logs,packages} ${XDG_DATA_HOME:-$HOME/.local/share}/julia
ln -s ${XDG_DATA_HOME:-$HOME/.local/share}/julia/* ${XDG_DATA_HOME:-$HOME/.local/share}/julia
mkdir -p ${XDG_STATE_HOME:-$HOME/.local/state}/julia
mv ~/.julia/{logs} ${XDG_STATE_HOME:-$HOME/.local/state}/julia
ln -s ${XDG_STATE_HOME:-$HOME/.local/state}/julia/* ${XDG_STATE_HOME:-$HOME/.local/state}/julia
mkdir -p ${XDG_CACHE_HOME:-$HOME/.cache}/julia
mv ~/.julia/{clones,compiled,registries} ${XDG_CACHE_HOME:-$HOME/.cache}/julia
ln -s ${XDG_CACHE_HOME:-$HOME/.cache}/julia/* ${XDG_DATA_HOME:-$HOME/.local/share}/julia
However, I think "Can't you do all of that with symlinks?" is actually asking the wrong question. Why should Linux users have to manually wrangle Julia into (mostly) doing what it should do by default?
I think we've arrived at the point where someone just needs to do it. See https://github.com/JuliaLang/julia/issues/4630#issuecomment-761265058 and https://github.com/JuliaLang/julia/issues/4630#issuecomment-761603914 for the desired directory assignments.
I think we've arrived at the point where someone just needs to do it.
Yup. Long posts and upvotes need to be converted into pull requests.
Regarding what I posted above:
I would put
packages
andartifacts
in the CACHE category: they are immutable and content addressed and should be completely reconstructions from the upstream servers.
I think a potential issue with this classification is if CACHE files can be automatically deleted because while these files are perfectly reproducible from upstream servers, if they're deleted, programs that use them will stop working. So depending on how systems interpret the meaning of CACHE, it might not be appropriate.
I agree. I think that, potentially, the precompilation cache (~/.julia/compiled
) could go in the CACHE category. But I agree that packages and artifacts should probably not go into the CACHE category.
Also, interestingly, packages and artifacts actually aren't user-specific, they are immutable and can be safely shared between all users. But they are typically installed by Julia as the user so without sufficient permission to install somewhere shared. So user data is probably the right classification. The compiled
directory can definitely go in the cache category and registries
also since that will get re-downloaded if it's missing.
I agree with the idea, I suppose it will be welcome for anyone. For those who are trying to get rid of ~/.julia/
while Julia does not adopt XDG dir as default, you can try this set of instructions. I do not know if it is working as it been a while since the last time I tried it. But at least, it is a trial.
It's probably worth mentioning here that I've just put together a package to make working with the XDG directories easier, with support for cross-platform equivalents.
https://github.com/tecosaur/XDG.jl
You can find the rational behind the decisions made in the docs.
Just from taking a quick look, it seems like there are a number of packages that are currently dealing with this incorrectly, such as FreeTypeAbstraction.jl on Windows and Linux, DataDeps.jl, then the various packages using ad-hoc cache, and the other packages beginning to clutter .julia
(conda, datadeps, makie, makiegallery, pluto_notebooks, symbolstorev2-lsp-julia, etc.).
Over the last 10 years most Freedesktop environments (GNOME, KDE, XFCE...) and applications around them have moved to store configuration in the XDG config directory (by default ~/.config) instead of creating a hidden directory directly into the home folder. A well-known non-GUI application that does the same is git: https://github.com/git/git/blob/master/Documentation/RelNotes/1.7.12.txt#L18-23
The Freedesktop specification is here: http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
and a rationale for this specification can be found here: https://wiki.gnome.org/GnomeGoals/XDGConfigFolders
I think Julia should follow it too. The idea is very simple, you just need to read an environment variable:
Thus, in case $XDG_CONFIG_HOME is not defined, settings would be stored in the default ~/.config/julia.
Packages should likely go to $XDG_DATA_HOME, which defaults to .local/share/ (the equivalent of /usr/share/). One rule of thumb is that configuration is more like text files with limited size and that do not change that much (some people even keep it under a VCS). OTOH $XDG_DATA_HOME is usually conceived as a place to store valuable user data, which needs backup, while packages can easily be downloaded again. After much reading and thinking I still don't know so this probably means it does not really matter.
So the changes would be pretty limited actually. In the documentation, it would be sensible to speak about ~/.config/julia instead of ~/.julia, without mentioning the environment variables trick (users who change the defaults will know).
One of the potential gains of following that XDG spec is that it also defines $XDG_CONFIG_DIRS and $XDG_DATA_DIRS, which could be used by administrators to provide a system-wide package library, and default settings for new users. Julia would read config from there, but only save user config and packages in the user home directory.