Open giordano opened 4 years ago
maybe also a jll for the original source directory such that we can download it automatically in the debugger if necessary.
I agree that this is desirable. I'm not entirely sure that the right way to do it is to create multiple JLL packages; or at least, not necessarily the user-facing way.
Here are my thoughts:
For some projects, we have the genuine desire to split a JLL into multiple independent packages; Clang_jll
and LLVM_jll
really don't have anything to do with eachother; sometimes you may want Clang_jll
and not LLVM_jll
and vice-versa. The fact that they both stem from the same build process is more or less an implementation detail. (Oh, and they both rely upon LibLLVM_jll
, but that's fine). To save on build time/duplicated effort, it would be nice to be able to split a single build_tarballs()
's output into multiple, independent, JLL packages.
For most of the projects in existence, we have a mixture of files; some things are generally essential (dynamic libraries, executables) some things are nice to have (include files, external debugging symbols) and some things are almost never needed (static libraries). It would be nice to be able to split a single build_tarballs()
's output into different "configurations", making use of https://github.com/JuliaLang/Pkg.jl/issues/1780
First and foremost, for this to work nicely, I think we're going to need to break up build_tarballs()
a bit; right now we have everything built with the very deep assumption that we can flow smoothly from sources to JLLs, but that breaks down in a few places such as IntelOpenMP, CUDA, LLVM, MKL, etc... I think we need to split build_tarballs()
up into two separate pieces: the piece where we call autobuild()
as many times as we need, generating unpacked prefixes of build products, then the piece where we carve those prefixes up into JLLs. We can, of course, continue to expose a build_tarballs()
that does all that automatically, but we need to have a re-think of the underlying mechanisms to make this effortless.
I envision having a function build_binaries!()
that we call in a similar manner to build_tarballs()
, but it doesn't take in name
, version
or products
; all it does is build unpacked prefixes, and return the meta information about that build, coalesced into a single meta
object:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)
This meta
object will contain all the information we're used to having in e.g. the JSON object (and in fact will be what we serialize with --json-meta
in the future; this will make it much easier to understand how we mock out parts of the BB pipeline when running on Yggdrasil), and is what we will use when we perform the second step, which is extraction and JLL construction:
# This would be defined by default, but just explicitly make it for illustration's sake
everything_extractor = raw"""
mv ${srcdir}/* ${prefix}/
"""
build_jll!(meta, name, version, platforms, dependencies, everything_extractor)
This gives us the flexibility to do an awful lot:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, filter(p -> !Sys.iswindows(p), platforms), dependencies)
build_binaries!(meta, ARGS, win_sources, win_script, filter(p -> Sys.iswindows(p), platforms), dependencies)
build_jll!(meta, name, version, platforms, dependencies, everything_extractor)
Note that if we're going through the trouble of rewriting this stuff, we can probably get rid of should_build_platform()
in fancy toys by doing that automatically inside of build_binaries!()
; e.g. if a platform is given within ARGS
, use that to filter out the passed-in platforms
objects, and if there's nothing left, return eagerly.
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)
LibLLVM_extractor = raw"""
# Copy over `llvm-config`, `libLLVM` and `include`, specifically.
mkdir -p ${prefix}/include ${prefix}/tools ${libdir} ${prefix}/lib
mv -v ${srcdir}/include/llvm* ${prefix}/include/
mv -v ${srcdir}/tools/llvm-config* ${prefix}/tools/
mv -v ${srcdir}/$(basename ${libdir})/*LLVM*.${dlext}* ${libdir}/
mv -v ${srcdir}/lib/*LLVM*.a ${prefix}/lib
"""
build_jll!(meta, "LibLLVM_jll", version, platforms, dependencies, LibLLVM_extractor)
Clang_extractor = raw"""
...
"""
build_jll!(meta, "Clang_jll", version, platforms, dependencies, Clang_extractor)
...
# Build once with `-O2` and extract it into a default variant, as well as a "build" variant:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)
base_extractor = raw"""
for f in $(find_binary_objects ${srcdir}); do
rp=$(relpath ${srcdir} ${f})
mkdir -p $(dirname ${rp})
mv ${f} ${prefix}/${rp}
done
"""
build_jll!(meta, name, version, platforms, dependencies, base_extractor)
build_jll!(meta, "$(name)+build", version, platforms, dependencies, everything_extractor)
# Build once with `-O1 -g` and bundle it into the "debug" variant:
debug_meta = BuildMetadata()
build_binaries!(debug_meta, ARGS, sources, debug_script, platforms, dependencies)
build_jll!(meta, "$(name)+debug", version, platforms, dependencies, everything_extractor)
The "variants" would all be put into the same JLL release as artifacts with names that have the +
postpended (as that's not a valid JLL name, of course), and we'd have ways for the user to request which artifacts get installed on their system through things like https://github.com/JuliaLang/Pkg.jl/issues/1780. We could arbitrarily decide that BB itself always installs the +build
variant (if available) into the prefix when building. (Or we could even allow for Depedency()
objects to provide a variant
kwarg)
What do you guys think?
That sounds fantastic! I would call them build!
and generate!
/package!
If we had "partial artifact" download support, we could simplify this a bit, in that we could generate only a single tarball that has everything: binaries, headers, and separate debug files. We could then work some Pkg server magic to allow requesting a union of subtrees rather than always the entire content tree. This would allow us to, for instance, request the union of subtrees that corresponds to just the shared libraries within lib/
and the binaries within bin
. The PkgServer would then generate a cut-down tarball containing those resources and pass it down to us. This would be the "minimal" artifact variant, while the "build" variant would include things like headers, static libraries, etc... Finally, the "full" variant would include external debug symbols that were stripped out from the executables during build.
To strip out debug symbols into external files, we can use the following tools:
objcopy --only-keep-debug $file $debug_file
strip --strip-debug --strip-unneeded $file
objcopy --add-gnu-debuglink $debug_file $file
Assuming we are able to work our PkgServer magic above, we will be able to stream down content trees where these files exist on-disk side-by-side, which makes the whole thing much easier. If we must keep the files separate, this becomes more difficult, we'd probably have to modify files on-disk to get relative pathing correct, or force debuggers to do the searching themselves (this is easier if we embed build ids, see below).
dsymutil
to create .dSYM
bundles (or files, if we want, by passing in -flat
):
dsymutil $file
Note that we probably want to start adding --build-id=sha1
to our LDFLAGS
to aid in debugging efforts, as that allows for easier matching of files.
We can force -g
into all compiler invocations via our compiler wrappers, and invoke dsymutil
upon all executables at the end of the build if we're running on Darwin. It really should be that simple. :)
Oh, I also just thought to myself it would be cool to switch between e.g. debug and non-debug versions through Preferences
, so a JLL package would default to installing a minimal
variant, but it can be opted-in to a higher variant by setting a Preference in the overall Project.toml
that is using the JLL.
Thinking about this again, it would also be really sweet for debug versions of JLLs to include all source files referenced by the DWARF files, stored in a predictable place (like <$artifact_path>/src
) so that we can use source-map
to get lldb
/gdb
to find the source when we're debugging an artifact.
We can add a post-processing step that inspects all DWARF files, finds all referenced source files (even autogenerated ones) and stores them in the appropriate location within a $destdir/src
directory. Then we just need a convenient way to map /workspace/srcdir => $artifact_path/src
within lldb
/gdb
and we'll have a really slick debugging experience for our users.
I think most of our compiler support split dwarf info? https://gcc.gnu.org/wiki/DebugFissionDWP
In the last few weeks I've been thinking about this issue again, and coming up with beautiful ideas like using Preferences.jl
to install debug version of packages, just to realise that Elliot already proposed it :disappointed:
Another idea that just came to my mind is to have dev/debug tarballs as lazy artifacts of the same JLL package, instead of their own packages, but Elliot anticipated me again:
The "variants" would all be put into the same JLL release as artifacts with names that have the + postpended
I like this idea! In particular, I'm thinking about splitting also the logs into their own tarballs. A nice benefit is that this could make the runtime tarball reproducible across multiple identical rebuilds.
One additional thing to mention is that now that we have JLLWrappers.jl
we can automatically generate functions to download the additional artifacts, without having to change anything in the packages.
Just wanted to add that for JLLs which link against libjulia, it would be nice to have debug variants which link against libjulia-debug (this is orthogonal to the question of debug symbols and how to handle them). Right now, I am debugging a Julia package (Oscar.jl) involving four JLLs linking against libjulia (libcxxwrap-julia, libsingular-julia, libpolymake-julia, GAP) and it isn't exactly fun.
So perhaps there could be another "variant marker" indicating "download this instead of the default if this is a Julia debug build"
I just came across debuginfod
which seems very complementary. It allows for gdb and others to auto-fetch debuginfo!
From time to time I look to the package managers of Linux distributions to see if we can pick up some interesting ideas. One thing that I think would be really cool to have here is to be able to generate multiple JLL packages with a single builder: the result of a build doesn't go into a single tarball, but it might be split into many of them.
One fancy application is to be able to generated:
Libfoo_jll
: contains onlybin/
andlib/
, this is the runtime part, what the Julia packages will use;Libfoo_dev_jll
: containsinclude/
, header files are generally useless for Julia packages and they mostly clutter~/.julia/artifacts/
with dozens of small filess. Ideally, this would be automatically installed, if available, whenLibfoo_jll
is used as dependency in a build;Libfoo_dbg_jll
: contains the debug symbols of the shared library, that users can optionally install to get more useful debug information about crashes or errors. Based on an idea by @keno.Also, I think that
LLVM_full_jll
is currently "wrong": IMO it should simply be an empty metapackage binding all the other pieces. Instead now it's a monster package containing the same data as its pieces, which means that if we use bothLLVM_full_jl
andlibLLVM_jll
in a build, they would step onto each other's toes. Having a single builder that produces all other subpackages would probably make @vchuravy happy, too.