JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.62k stars 5.48k forks source link

Make stdlibs use the artifact system #33973

Open KristofferC opened 4 years ago

KristofferC commented 4 years ago

It would be nice if the stdlibs started using the artifact system to declare what libraries they depend on and how to get them for the different platforms. That would make the stdlibs easier to move out from the julia repo and in cases where one doesn't want to bundle all stdlibs in a sysimage (e.g. in an "app") it would be clear what libraries can be excluded from bundling as well.

StefanKarpinski commented 4 years ago

@staticfloat, you seem like the prime candidate for this 😁

staticfloat commented 4 years ago

So here's the thinking that Stefan and I have briefly discussed:

We should firm up some of the implicit laziness that the stdlibs have relied upon with respect to binary dependencies, and simultaneously use this as an opportunity to take a step towards decoupling stdlibs from the Julia build system both at build time and at run time.

Properties we want

Implementation strategy

To represent stdlib binary dependencies through JLL packages and Artifacts, Stefan and I think the best way is to start shipping a read-only depot with Julia that gets added on to the default list of depots, that contains all of our stdlibs, their JLL packages, and their Artifacts. This would clean out the majority of the libraries from <prefix>/lib/julia, and would instead rely on some hoops we jump through to load them from <prefix>/share/julia/stdlib/vX.Y.Z/artifacts/. It will be a fun challenge to make this work for everything including LLVM. Unsure if we can get there, but we'll give it a good shot. Once these stdlib packages are baked into the system image, we would have a list of things that the resolver shouldn't mess with, so that it doesn't accidentally install a new version of e.g. OpenBLAS_jll, which would just confuse everyone.

Possible different strategies

staticfloat commented 4 years ago

Thinking more about things like Julia needing to be able to find libLLVM at dynamic-link time, it will be sufficient on non-windows platforms to bake in RPATHs to look in $ORIGIN/$(datarootdir_rel)/stdlib/vX.Y.Z/artifacts/<LLVM_jll tree hash>. The only snag here is Windows; we can bake in a call to AddDllDirectory() within init.c, but that's a little unsatisfactory. The reason why I'm thinking about this is that I'd like to make it as straightforward as possible for us to truly have Julia use JLL packages for stdlibs, such that eventual rebuilds of Julia system images with newer versions of stdlibs can actually use their binaries in as natural a way as possible.

We also need a plan for dealing with from-source builds. Assuming that we still want from-source Julia builds to work, we're going to have to engage bandying about some benign falsehoods; when we build libopenblas, we'll have to bundle it up as a "fake" OpenBLAS_jll product. It's not too hard, we basically just install openblas like we always do (we're already careful to keep the OpenBLAS recipe in Yggdrasil as close as possible to the from-source build in JuliaLang/julia) but then slap it into the share/julia/stdlib/vX.Y.Z/artifacts/<tree hash> directory that we will know it needs to go into from parsing the OpenBLAS_jll/Artifacts.toml file.

Some pros/cons of what I've considered so far:

staticfloat commented 4 years ago

Digging into this over the past few days, I've come up with a few difficulties that may take some calm thinking to untangle properly:

First off, there's a philosophical decision to be made; do we want the actual binaries themselves to live in an $prefix/share/julia/artifacts/<tree hash>/lib/libfoo.so location, or do we want them to continue living in $prefix/lib/julia? For some binaries, it doesn't matter than much, but for others, it matters quite a bit.

Personally, I would like to push as much as possible for stdlibs and even the basic requirements for Julia (like LLVM, MPFR, GMP) to use artifacts. This is doable with enough scaffolding construction such that Julia can find things, but we need to answer if the necessary scaffolding is worthwhile:

Let's remind ourselves as to why we're doing this; with this kind of a system, it makes system image building much more modular and easy to understand; the distance between binaries users install and the binaries that ship with Julia shrinks. The resolver can see that LLVM_jll already exists on the user's machine and is of a particular version; attempts to Pkg.add("OpenBLAS_jll") naturally succeed immediately, as it's an stdlib, and using it is blazingly fast, as we would expect.

I don't have a concrete solution in mind yet, this is the third time I've written out this comment because I keep on experimenting with different things and finding new problems. The good news is that I have artifact downloading implemented in Make/Python, and putting JLL packages/artifacts into the share/julia folder works; but these bootstrapping issues are thorny.

ViralBShah commented 4 years ago

I would be fine with them living as artifacts. Making the system image more modular will allow to build smaller system images for deployment - so that's the right direction, imo.

staticfloat commented 4 years ago

I've made great strides in this on my branch. I've converted everything that it makes sense to, excepting LLVM. LLVM is a special case that I will address after this. First, the changelog:

Changelog

Splitting up LLVM_jll

It seems to me that we have an issue; we want to provide libLLVM alongside Julia in a JLL such that when users ask for a handle to libLLVM in a Pkg-informed way (e.g. through Pkg.add("LLVM_jll")) they are locked to the version that ships with Julia, and thereby get the same version that comes with their Julia version. However, LLVM_jll provides a lot more than what Julia itself ships with; it contains nice things like clang and opt and whatnot. I don't really think we should therefore start shipping clang with Julia, rather the opposite.

I think we should split LLVM_jll up into multiple packages; perhaps having a LibLLVM_jll and then have LLVM_jll depend on LibLLVM_jll, and only LibLLVM_jll is shipped with Julia. @maleadt and @vchuravy I am very interested in both of your thoughts on this.

maleadt commented 4 years ago

I think we should split LLVM_jll up into multiple packages; perhaps having a LibLLVM_jll and then have LLVM_jll depend on LibLLVM_jll, and only LibLLVM_jll is shipped with Julia.

Sounds good to me. The CUDA compiler really only needs libllvm, however, with the addition of some additional API calls from this source file. Maybe those should also be provided by the LibLLVM_jll? For other LLVM-based WIP I also need the headers and binaries, but that's just to build a tool so would be fine to put in a LLVM_jll package that only gets installed as part of a build_tarballs.jl.

It's not entire clear to me though how we would version this thing (e.g., with multiple builds of the aforementioned tool, one for each LLVM version, and I just want to install whichever one's compatible with the user provided LLVM while maintaining semver of the tool), but that's orthogonal to this refactor.

staticfloat commented 4 years ago

The CUDA compiler really only needs libllvm, however, with the addition of some additional API calls from this source file . Maybe those should also be provided by the LibLLVM_jll?

Won't the symbols in the file you linked be a part of libjulia? Those symbols will then always be available, right?

maleadt commented 4 years ago

Won't the symbols in the file you linked be a part of libjulia? Those symbols will then always be available, right?

Sure, but since they are essentially an extension of ilbllvm's C API it might make sense to put them there?

staticfloat commented 4 years ago

Ah, I see what you mean; these aren't used by the rest of Julia, they're only for the benefit of LLVM.jl.

Since we need to still support users building LLVM from source, I think we should probably keep it as a part of Julia's source.

vchuravy commented 4 years ago

Regarding LLVM_jll the right approach is probably to follow what Linux distros have been doing and break it up into LLVM_jll (with opt/llc/llvm-*) and Clang_jll for Cxx jl and Cxxwrap.jl

staticfloat commented 4 years ago

My branch now works on Linux, MacOS support is pending a new OpenBLAS JLL (as MacOS is more sensitive to things like dylib IDs than Linux is), and then finally Windows. The great triumph is that a default build (e.g. with nothing setting any USE_BINARYBUILDER_XYZ=0 settings) has only the following libraries outside of the main package depot's artifacts directory, with the vast majority being served from artifacts:

julia> using Libdl; filter(l -> !occursin("artifacts", l), Libdl.dllist())9-element Array{String,1}:
 "linux-vdso.so.1" "/home/sabae/src/julia-jllstdlibs/usr/bin/../lib/libjulia.so.1"
 "/lib/x86_64-linux-gnu/libdl.so.2"
 "/lib/x86_64-linux-gnu/librt.so.1"
 "/lib/x86_64-linux-gnu/libpthread.so.0"
 "/lib/x86_64-linux-gnu/libc.so.6"
 "/lib/x86_64-linux-gnu/libm.so.6"
 "/lib64/ld-linux-x86-64.so.2"
 "/home/sabae/src/julia-jllstdlibs/usr/lib/julia/sys.so"

julia> length(filter(l -> occursin("artifacts", l), Libdl.dllist()))
32
ViralBShah commented 4 years ago

Can the system image eventually be served as an artifact - so that I can then have many different system images for different projects?

staticfloat commented 4 years ago

I think the piece that needs to be solved is getting Julia to load a project-specific sysimage. Right now you need to pass -J which isn't very user-friendly. It would be nice to have something similar to --project and JULIA_PROJECT. I'm thinking something like --project=<path> could imply -J<path>/.sysimages/sys.$(triplet).$(dlext) or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.

ViralBShah commented 4 years ago

Well then we could even do optimized system images by architecture!

staticfloat commented 4 years ago

We already do that; we have images by architecture (e.g. x86_64, i686, etc...) and then within an image, we compile functions multiple times such that newer processors have versions of functions with expanded instruction sets.

tkf commented 4 years ago

I'm thinking something like --project=<path> could imply -J<path>/.sysimages/sys.$(triplet).$(dlext) or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.

Wouldn't it require various tools to agree on where to look at the system image? For example, you may want to use the same sysimage in your editor and in stand-alone scripts.

Maybe the UI/API in Pkg.jl or PackageCompiler.jl can include something that creates a simple text file (say) <path>/.sysimages/sys.$(triplet).link containing the path to the actual sys.$(triplet).$(dlext) file? I guess it is then easy enough to handle within libjulia? Also I guess you can use sysimage downloaded in ~/.julia/artifacts this way. It'd be nice if re-locatable sysimgs with non-stdlib packages can be distributed and used in different projects.

davidanthoff commented 4 years ago

I'm thinking something like --project=<path> could imply -J<path>/.sysimages/sys.$(triplet).$(dlext) or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.

The VS Code Julia extension has been shipping with exactly something like that for more than a year: https://www.julia-vscode.org/docs/dev/userguide/compilesysimage/.

tkf commented 4 years ago

@davidanthoff See #35794 that adds it to Julia.

KristofferC commented 4 years ago

I don't think this is a release blocker for 1.6 so removing milestone. @staticfloat please put it back if you see fit.

StefanKarpinski commented 3 years ago

It isn't going to make it for 1.6 but will be in 1.7.

ViralBShah commented 2 years ago

Did this make into 1.7, and have we done sufficient work to close this?

staticfloat commented 2 years ago

Sadly no, it did not. There is still some significant work to be done, but some smaller pieces have made it in.