JuliaPy / Conda.jl

Conda managing Julia binary dependencies
Other
173 stars 57 forks source link

install into a single global directory? #123

Closed stevengj closed 5 years ago

stevengj commented 5 years ago

Currently we install into the deps directory. However, in Julia 1.0, packages are no longer updated in place, this means that every time you update Conda.jl it needs to re-install anaconda, which is wasteful and breaks things like PyCall that expect libpython to stay in one place.

My thinking is that we should put the root environment into ~/.julia/$JLENV/conda$(CONDA_JL_VERSION) instead, where JLENV is the current Julia package environment. This way, we will only have a single Anaconda installation (per Python major version) that persists across updates.

However, I must admit that I don't fully grok how Pkg environments work. From reading the docs, I guess there is a stack of environments. @StefanKarpinski, is there a way to tell which environment Conda.jl was installed into? Am I thinking about this in the right way?

tkf commented 5 years ago

Isn't it wasteful to have the base conda environment for each Julia environment? I'd suggest:

Notes: The reason why above scheme (i.e., sharing ~/.julia/data/Conda/base across all Julia versions and environments) is useful is that conda has some basic de-duplication mechanisms. For example, it tries to use hard-link in the conda environments when it can. The conda installation directory (~/.julia/data/Conda/base above; usually ~/miniconda3 etc.) has some global cache such as pkgs containing downloaded package archives and decompressed directories of them. Using such mechanism to save disk space requires to share the base installation.

Some discussion points:

Edit: swap "data" and "share" Edit2: Add "Notes"

stevengj commented 5 years ago

I agree it's fine for different Pkg environments to correspond to different conda virtualenvs.

share in Unix refers to a directory for architecture-independent data. I think ~/.julia/conda{2,3} is fine here and will be convenient — the Conda.jl package is kind of in a special situation because it would be unwise for other packages to install whole Anaconda distros.

I don't think we should use ~/miniconda — the whole point of Conda.jl is to install a Julia-specific Anaconda distro so that it doesn't get messed up by whatever other stuff the user might have installed.

tkf commented 5 years ago

I thought it'd be nice to have a scheme for package-specific data directory. For example, dataset library (like RDatasets.jl) could use such location across all environments. But this is more like Pkg.jl enhancement idea.

I understand that that the aim of Conda.jl is to isolate it from user's ~/{mini,ana}conda. My point was that you only need to have a single conda installation for Julia-specific usage. Everything else can be conda's (virtual) environment. Those environments are isolated from the base environment (in principle, unless conda have some critical bugs). You can even create Python 2 conda environment with Miniconda3. Furthermore, you can use ~/miniconda to "bootstrap" Conda.jl's main environment by installing conda package in it. This way, you don't need to touch ~/miniconda base environment. But this last point was probably too eager and was not the main point.

stevengj commented 5 years ago

I'm just worried about user-maintained ~/miniconda base environments being broken in some way — I've seen too many bit-rotted Python installations to trust something we find in a non-Julia directory.

stevengj commented 5 years ago

(See https://github.com/oxinabox/DataDeps.jl for other kinds of data.)

tkf commented 5 years ago

I'm just worried about user-maintained ~/miniconda base environments being broken in some way

Sure, I understand the worry. How about reusing the same miniconda installation for all Julia versions and all Julia environments? (They of course can have different environments.)

DataDeps.jl

It looks like they have a similar discussion too: https://github.com/oxinabox/DataDeps.jl/issues/48

stevengj commented 5 years ago

The lack of persistent package options (JuliaLang/Juleps#38) is a problem here too, because we currently have no way of "remembering" whether the user selected Python 2 or Python 3 or some custom environment.

It's pretty urgent that we get some fix here, even if it is suboptimal. Upgrading Conda.jl currently takes forever, breaks PyCall (because the libpython path changes), and wastes gigabytes of space.

tkf commented 5 years ago

Why not use ~/.julia/data/Conda/envs/v$(VERSION.major).$(VERSION.minor)? This would be forward-compatible to what I suggested in https://github.com/JuliaLang/Pkg.jl/issues/777#issuecomment-428058336. Or maybe even ~/.julia/environments/v$(VERSION.major).$(VERSION.minor)/condajl?

Isn't it simple to fix once the location is decided?

StefanKarpinski commented 5 years ago

The Julia version doesn’t seem necessary or sufficient for isolating Conda setups.

tkf commented 5 years ago

I agree. But I don't think it's reasonable to install miniconda for each Julia environment since it's much more space consuming. That's why I suggested a hybrid approach based on private_env package option. https://github.com/JuliaLang/Pkg.jl/issues/777#issuecomment-428058336 (added: read https://github.com/JuliaPy/Conda.jl/issues/123#issuecomment-423376252 first)

Furthermore, to obtain some degree of de-duplication so that it becomes reasonable to use separate conda environment for some Julia environments, we need https://github.com/JuliaLang/Pkg.jl/issues/777

stevengj commented 5 years ago

Using the Julia version will also lead to too many Conda installations; there is no strong reason not to share conda installations between Julia 1.0 and Julia 1.1, for example, or for that matter with Julia 0.7.

In the short run, I'm starting to feel like it will be better to just install conda in ~/.julia/conda (shared between all Julia versions and environments) and worry later about adding the option to let a Julia project/environment install its own conda virtualenv with a given set of packages. The latter seems like it depends on Pkg option support anyway.

StefanKarpinski commented 5 years ago

Yes, that seems pretty reasonable to me.