Closed staticfloat closed 5 years ago
Okay, let's get started on the first bullet point of this list; defining a BinaryArtifact
type within Pkg. We need to create a new datatype within Pkg
that represents not a Julia package, but a BinaryArtifact
, which is distinct in the following ways:
BinaryArtifact
s are chosen not only by version, but also by runtime-reflected properties (CPU architecture, OS, libgfortran version, etc....)BinaryArtifact
s as something they require, complete with version bounds.BinaryArtifact
s to either "export code" or "bundle metadata". Things like "LibFoo.jll
exports the abspath location of libfoo.so
", or a wrapper function that sets environment variables before invoking Git.jll
's bundled git.exe
.I guess we can create an AbstractDependency
type with PackageSpec
and BinaryArtifact
as subtypes? Then we replace most current occurrences of PackageSpec
with AbstractDependency
.
Is the idea to download a BinaryArtifact
and then key into it with runtime information to determine what tarballs should be downloaded? Or is a BinaryArtifact
the tarball itself?
How about just calling it Dependency
since we're not going to have Dependency <: AbstractDependency
, we're going to have PackageSpec, BinaryArtifact <: Dependency
.
Ok, and theses types of nodes will be mostly indistinguishable until we hit what is currently build_versions
. At which point, we key into them with runtime information(i.e. choose_download
) to determine the exact tarball which needs to be set up. Is that roughly the plan?
Sounds reasonable to me; I'd be happy to discuss this further and nail down more of an implementation plan during the Pkg call tomorrow?
Version constraints are against the version of the library, not the version of the thing that builds the library. But you want to be able to lock down a specific build of a library. But a specific build is completely platform-specific. There are some layers of versioning:
Is this correct and complete? The artifact identity should be completely determined by some "system properties" tuple that captures all the things that determine which artifact generated by a build script one needs. The end user mostly only needs to care about the library version, which is what determines its API and therefore usage. There might, however, be situations where one needs compatibility constraints on both the library version and the build script version: e.g. an older build was configured in some way that makes the resulting artifact unusable in certain ways.
Does a given version of a build script always produce just a single version of a given library?
How would this work with packages that use BinaryProvider but fall back to compiling from source if a working binary is not available (typically for less-popular Linux distros)? e.g. ZMQ or Blosc IIRC. You need some kind of optional-dependency support, it seems, or support for a source “platform”.
For building from source, we will support it manually by allowing users to dev
a jll
package, then they just need to copy their .so
files into that directory. This is analogous to allowing users to modify their .jl
files within a dev
'ed Julia package.
I do not think we should ever build from source automatically. Looking at ZMQ, it looks like you have full platform coverage; under what circumstances are you compiling?
Another example to add to Steven's list is SpecialFunctions, which falls back to BinDeps when a binary isn't available from BinaryProvider. Once upon a time that was used on FreeBSD, before we had FreeBSD support in BinaryProvider, but now I don't know when it's used aside from on demand on CI.
Looking at ZMQ, it looks like you have full platform coverage; under what circumstances are you compiling?
We needed it on CentOS, for example (JuliaInterop/ZMQ.jl#176), because of JuliaPackaging/BinaryBuilder.jl#230.
There are an awful lot of Unix flavors out there, and it's nice to have a compilation fallback.
Regardless of the many UNIX variations, the only things you really need are the right executable format and the right libc, which we can pretty much cover at this point.
And the right libstdc++
, which is apparently harder to cover.
(This was why I had to enable source builds for ZMQ and Blosc. Are we confident that this is fixed, or are we happy to go back to breaking installs for any package that calls a C++ library?)
I think our libstdc++
problems should be largely solved now that https://github.com/JuliaPackaging/BinaryBuilder.jl/issues/253 has been merged. We now build with GCC 4.8.5 by default, using a libstdc++
version of 3.4.18
, so we are guaranteed to work with anything at least newer than that. I'm not entirely sure it's possible to build Julia with GCC earlier than 4.8 at the moment, (the Julia README still says GCC 4.7+, but I'm pretty sure LLVM requires GCC 4.8+) so this seems like a pretty safe bet to me. I would be eager to hear how users are running Julia with a version of libstdc++
older than 3.4.18
.
Should https://github.com/JuliaPackaging/BinaryBuilder.jl/issues/230 be closed then?
Yes I think so.
I'm very supportive in managing the binary artifacts by Pkg. I'd just like to point out that the implementation of library loading should be flexible enough to include some strategy for AOT compilation and deployment (to a different computer). The app deployed to a different computer will have to load libraries from different locations and the hardcoding of paths in deps.jl makes this pretty difficult, see JuliaPackaging/BinaryProvider.jl#140. The best way would be either not have deps.jl
at all or no need to store absolute path to the library.
Yes, that's the plan: you declare what you need, referring to it by platform-independent identity instead of generating it explicitly and then hardcoding its location, instead allowing Pkg to figure out the best way to get you what you need and telling you where it is.
Progress! There is some code behind this post, and other things remain vaporware, with the aspiration of striking up some discussion on whether these are the aesthetics we want.
Artifact.toml
. These currently look something like this:name = "JpegTurbo_jll"
uuid = "7e164b9a-ae9a-5a84-973f-661589e6cf70"
version = "2.0.1"
[artifacts.arm-linux-gnueabihf]
hash = "45674d19e63e562be8a794249825566f004ea194de337de615cb5cab059e9737"
url = "https://github.com/JuliaPackaging/Yggdrasil/releases/download/JpegTurbo-v2.0.1/JpegTurbo.v2.0.1.arm-linux-gnueabihf.tar.gz"
[artifacts.arm-linux-gnueabihf.products]
djpeg = "bin/djpeg"
libjpeg = "lib/libjpeg.so"
libturbojpeg = "lib/libturbojpeg.so"
jpegtran = "bin/jpegtran"
cjpeg = "bin/cjpeg"
[artifacts.i686-w64-mingw32]
hash = "c2911c98f9cadf3afe84224dfc509b9e483a61fd4095ace529f3ae18d2e68858"
url = "https://github.com/JuliaPackaging/Yggdrasil/releases/download/JpegTurbo-v2.0.1/JpegTurbo.v2.0.1.i686-w64-mingw32.tar.gz"
[artifacts.i686-w64-mingw32.products]
djpeg = "bin/djpeg.exe"
libjpeg = "bin/libjpeg-62.dll"
libturbojpeg = "bin/libturbojpeg.dll"
jpegtran = "bin/jpegtran.exe"
cjpeg = "bin/cjpeg.exe"
...
My plan is to embed this file into the Registry in the same way that Project.toml
files are embedded right now. Artifacts will be analogous to Project.toml
files with the following similarities/differences:
Compat.toml
, Deps.toml
and Versions.toml
entries, which will function exactly the same as a normal Registry entry, except that the downstream DAG of Artifacts can only contain other Artifacts; an Artifact cannot depend on a general Julia package, so in that sense the dependency links are restricted somewhat.Manifest.toml
, Project.toml
or Package.toml
, only the afore-mentioned Artifact.toml
. This is mostly for simplicity, I don't see why we need these, but I am aware that I may not be thinking this through completely.Pkg
is now binary platform-aware, by essentially gutting code from BinaryProvider
to instead live inside of Pkg
. This allows me to ask things like "what is the ABI-aware triplet of the currently-running host?" (you now get that by calling Pkg.triplet(Pkg.platform_abi_key())
).
When the user expresses a dependency on one of these Artifact objects (e.g. through Pkg.add("LibFoo_jll")
) it will get added to the dependency graph as usual, but when being concretized into a URL to be downloaded, an extra step of indirection is applied by reaching into the Artifact.toml
's dictionary, finding dict["artifacts"][triplet(platform_abi_key())]
and using the embedded entries as the url
and hash
to download and unpack into a directory somewhere.
After downloading and unpacking the binaries, Pkg
will generate a wrapper Julia package that exposes an API to "get at" these files, so that client code (such as LibFoo.jl
, the fictitious julia-code side of things) can use it in as natural a way as possible. Example generated Julia code:
# LibFoo_jll/src/LibFoo_jll.jl
# Autogenerated code, do not modify
module LibFoo_jll
using Libdl
# Chain other dependent jll packages here, as necessary
using LibBar_jll
# This is just the `artifacts` -> platform_key() -> `products` mappings embedded in `Artifact.toml` above
const libfoo = abspath(joinpath(@__DIR__, "..", "deps", "usr", "lib", "libfoo.so"))
const fooifier = abspath(joinpath(@__DIR__, "..", "deps", "usr", "bin", "fooifier"))
# This is critical, as it allows a dependency that `libfoo.so` has on `libbar.so` to be satisfied.
# It does mean that we pretty much never dlclose() things though.
handles = []
function __init__()
# Explicitly link in library products so that we can construct a necessary dependency tree
for lib_product in (libfoo,)
push!(handles, Libdl.dlopen(lib_product))
end
end
end
Example Julia package client code:
# LibFoo.jl/src/LibFoo.jl
import LibFoo_jll
function fooify(a, b)
return ccall((:fooify, LibFoo_jll.libfoo), Cint, (Cint, Cint), a, b)
end
...
I like it in general. I'll have to think for a bit about the structure of the artifacts file. There's a consistent compression scheme used by Deps.toml
and Compat.toml
; we'll want to use the same compression scheme for the artifact data in the registry which somewhat informs how you want to structure the data in the file as well.
Do you think I think we'll eventually want to teach ccall
about libraries so that we can just write ccall(:libfoo, ...)
and have it know to find the LibFoo
shared library? That seems like the nicest interface to this possible—just declare the dependency in your project file and ccall it with the right name and everything just works.
That seems like the nicest interface to this possible—just declare the dependency in your project file and ccall it with the right name and everything just works.
I am actively shying away from teaching Pkg/Base too much about dynamic libraries; it's a deep rabbit hole. In this proposal I'm even not baking in the platform-specific library searching awareness (e.g. "look for libraries in bin
on windows, lib
elsewhere). I want to keep Pkg as simple as possible.
On the other hand, I would like it if dlopen()
was able to tell me, for instance, that trying to use libqt
on a Linux system that doesn't have X11 installed already isn't going to work. It would know this because it would try to dlopen("libqt.so")
and fail, and it would inspect the dependency tree and notice that libx11.so
was not findable. This is all possible with not much new code written, but it does mean that we need to bring in things like ObjectFile.jl
into Base
, and that's a lot of code.
It would be nice if we could do things like search for packages that contain libfoo.so
. That's actually one advantage to listing everything out in the Artifact.toml
within the registry like that.
There's a consistent compression scheme used by Deps.toml and Compat.toml
I'm not entirely sure what you mean by this, but I will await your instruction. I have no strong opinions over the Artifact.toml
organization, except for the vague feeling that I want to make it as small as possible to avoid bloating the registry and making things slow to download/install/parse/search.
After downloading and unpacking the binaries, Pkg will generate a wrapper Julia package that exposes an API to "get at" these files, so that client code (such as LibFoo.jl, the fictitious julia-code side of things) can use it in as natural a way as possible. Example generated Julia code:
const libfoo = abspath(joinpath(@__DIR__, "..", "deps", "usr", "lib", "libfoo.so")) const fooifier = abspath(joinpath(@__DIR__, "..", "deps", "usr", "bin", "fooifier"))
This automatic wrapper generation with const assigning the absolute path is exactly the the thing that prevents AOT with deployment to a different computer. So during AOT PackageCompiler will need to modify every single artifact_wrapper_jlpackage to get rid of the baked-in absolute path.
If the code is auto-generated, why cannot this functionality be part of some function or macro-call that would open the handles and generate the const paths on-the-fly? In that case PackageCompiler could just pre-collect all the artifact to a "deployment depot" and let the dlopen
reach for this "configurable" path. Or would redefine this const-path generator for the AOT build.
And is the constantness of the lib path really necessarily for efficient ccall
?
Is there any idea for how to integrate non-BP artifacts/dependencies? e.g. Conda.jl, or software which requires separate installers?
Similarly, what about providing a mechanism for overriding BP choices, e.g. the infamous Arpack issue, or cluster-specific MPI implementations?
For overriding choices like for Arpack, I think doing dev Arpack_jll
, then just installing/copying/linking whatever libraries you want into ~/.julia/dev/Arpack_jll
is the right solution.
I think it would be awesome if one could say something like pkg> use_system_libs Arpack_jll
, and then wouldn't need to link anything manually into ~/.julia/dev/Arpack_jll
but somehow it would just pick up whatever version of the binary dependency is installed on the system.
I want to second what @phlavenk said about generating these hardcoded paths being precisely why packages that require a build step are currently non-relocatable, so let's avoid that.
And is the constantness of the lib path really necessarily for efficient ccall?
According to Jameson, yes, it really is. Fortunately, we can work around this because by manually dlopen()
'ing everything in __init__()
we don't have to pass the absolute path in, we just need to pass in the SONAME
of each library. To deal with this, I've added code in BB to ensure that the SONAME of a library (e.g. libjpeg.so.62
on Linux) is always openable (e.g. by ensuring that symlinks exist) and then always recording the SONAME as the name of the library in the Artifacts.toml
, and using that as the const
value that we pass into ccall
. We don't need to worry about ambiguity errors here, because we will have already dlopen()
'ed the correct value within __init__()
, which we can do at runtime with dynamically calculated paths.
@staticfloat and I came up with a plan that we then ran by @KristofferC and everyone is on board with. It's one of those stupid simple designs that seems totally obvious and like the first thing we should have come up with, but that's how design works, so 🤷♂. Here goes explaining it.
The core addition is an Artifacts.toml
file which lives next to the Project.toml
and Manifest.toml
file. When installing a project (usually a package but it would make sense for apps too) which has an artifacts file, Pkg will look through the file and install any artifacts which are relevant to the current platform. (There should probably also be a way to also install for other platforms for cases where one is using Pkg to setup pre-installed package setups in shared directories for multiple platforms.)
The format of the Artifacts.toml
file is as follows:
[dataset-A]
hash.sha256 = "b2ebe09298004f91b988e35d633668226d71995a84fbd12fea2b08c1201d427f"
url = "https://somedomain.com/path/to/dataset.csv"
[nlp-model-1]
hash.sha256 = "5dc925ffbda11f7e87f866351bf859ee7cbe8c0c7698c4201999c40085b4b980"
url = "https://server.com/nlp-model-1.onnx"
[[libfoo]]
hash.sha256 = "19e7370ab1819d45c6126d5017ba0889bd64869e1593f826c6075899fb1c0a38"
url = "https://server.com/libfoo/Linux-armv7l/libfoo-1.2.3.tar.gz"
sys.os = "Linux"
sys.arch = "armv7l"
[[libfoo]]
hash.sha256 = "95683bb088e35743966d1ea8b242c2694b57155c8084a406b29aecd81b4b6c92"
url = "https://server.com/libfoo/Windows-i686/libfoo-1.2.3.tar.gz"
sys.os = "Windows"
sys.arch = "i686"
[[libfoo]]
hash.sha256 = "b65f08c0e4d454e2ff9298c5529e512b1081d0eebf46ad6e3364574e0ca7a783"
url = "https://server.com/libfoo/macOS-x86_64/libfoo-1.2.3.tar.gz"
sys.os = "macOS"
sys.arch = "x86_64"
What this means is:
url
value describes where to download the artifact fromhash
is a dict of hash algorithms to hash values of the downloaded filesys
entry determines which systems this variant applies toos
, arch
, libc
, libstd++
, etc.url
value describes where to download the artifact variant fromhash
is a dict of hash algorithms to hash values of the downloaded fileSo, for example when a package with this Artifacts.toml
file in its root is installed, Pkg will look at this file after installation and download three additional files into the ~/.julia/artifacts
directory:
dataset-A
nlp-model-1
libfoo
variants based on the current OS and architectureIf there is no variant of some artifact that matches the current platform, then there is a package installation error, much as if downloading the package itself had failed. Inside of a package which has an artifacts file, one will be able to write something like artifact"dataset-A"
to get a path to the downloaded dataset-A
artifact. Similarly, artifact"libfoo"
will provide the location of the variant of the libfoo
artifact which matches the current platform.
Note that the url
entries in artifacts file should be considered "advisory" not permanent: they give a location where the artifact may be found, but if it has moved, then the artifact may be found by some other means by its hashes. This is similar to how I've proposed adding advisory repo locations in manifest files in the discussion on https://github.com/JuliaLang/Pkg.jl/issues/635 (contrary to the original desire there to put the URLs in the project file, which I don't think we should do).
BinaryBuilder will generate libfoo
packages which provide the API to load and use the libfoo
binary dependency. These are normal Julia packages except that they are generated rather than written by hand. They are versioned and registered like normal Julia packages and it is these versions which the package resolver reasons about. The resolver does not know or care about specific variants of artifacts—it just picks a version of libfoo
from the registry and then installs it. Once a chosen version is installed, Pkg looks at the installed Artifacts.toml
file inside of libfoo
and will see a set of [[libfoo]]
stanzas for all the variants of the artifact which are provided by this version of the libfoo
package. It will install the first one that matches the current platform. The source of the libfoo
package will use the artifact"libfoo"
API to find the location of the library and load it. The end user is presented with a simple API where they just write using libfoo
to load and use the libfoo
library.
In this design the manifest file remains platform independent: it contains an entry for the libfoo
package, which is platform-independent. The libfoo
package is the only place that needs to concern itself with variants of the libfoo
artifacts and where to find them. It also avoids putting all the platform-specific information about artifact variants into the manifest file, which would lead to a lot of bloat, especially since it would be repeated in each manifest that depends on a platform-specific artifact. Instead, this design avoids repeating that information at all—it all lives in one place, in the package which uses the artifact. We may, however, want to allow the resolver to reason about which platforms a particular BB package version supports. This could be exposed in the registry to allow the resolver to pick a version that support the desired (usually current) platform. What does not need to go into the registry, however, is the details of the platform-specific artifacts—it only needs to know which platforms a version of a package supports—once a version is chosen, it is only in the install phase that Pkg needs to know about where to get artifact variants.
Other things we might want to support:
One thing that came up during the design discussion is: why have a separate Artifacts.toml
file? Instead, one could have an [artifacts]
section in the Project.toml
file. There are a few reasons:
[artifacts.nlp-model-1]
rather than just [nlp-model-1]
. In a previous design iteration, the platform for artifact variants was in the section header, which made this more of an issue. In this iteration, the header is just the name of the artifact.We could potentially support having an [artifacts]
section in the project file OR a separate Artifacts.toml
file. Maybe BinaryBuilder-generated packages will be the only ones that use platform-specific variants, in which case having this in the project file wouldn't be so bad since those would be machine-generated and not often looked at or modified by humans, whereas package that people actually write would tend to have platform-independent artifacts that aren't so verbose.
This sounds awesome!
Couple of random thoughts:
I think this kind of design would work for many more scenarios than just BinaryBuilder stuff, right? For example, I think I could entirely get rid of the build.jl
scripts in cases like this or this? If that was the case, it would be fantastic.
Where would artifacts (and extracted artifacts) be stored? Ideally not in the package folder, right? But in something like .julia/artifacts
? That way if a package gets updated, but still needs the same artifact, the artifact wouldn't have to be redownloaded/extracted, right?
I like Artifacts.toml
, and I wouldn't allow that stuff to also appear in Project.toml
. I generally think for something like that it is better to not offer choices, it just gets confusing, and then one also needs to support all these different options in all the tools, and I just don't think it is worth the extra effort.
Could there be a "fallback" script option for an artifact that is invoked if there is no binary for the current platform? This could be optional. But one could imagine something like:
[[libfoo]]
build_script = "debs/build.jl"
If a section like that is present, and there is no binary for the current platform for libfoo
, then this script runs. And then folks could still try to compile for exotic platforms in that script, or do something else.
And one question: I assume artifact acquisition would just happen during build
?
I think this kind of design would work for many more scenarios than just BinaryBuilder stuff, right? For example, I think I could entirely get rid of the
build.jl
scripts in cases like this or this? If that was the case, it would be fantastic.
Yes, this isn't BinaryBuilder-specific at all and would be perfect for doing that kind of thing without a build step. Our long-term goal is to get rid of deps/build.jl
altogether and make packages completely immutable — install them and never change them. Of course artifacts will also be content-addressed and immutable. There's a theme here 😁
Where would artifacts (and extracted artifacts) be stored? Ideally not in the package folder, right? But in something like
.julia/artifacts
? That way if a package gets updated, but still needs the same artifact, the artifact wouldn't have to be redownloaded/extracted, right?
Yes, I mentioned ~/.julia/artifacts
above but forgot to elaborate. It should be content-addressed and immutable, so the simplest version of this would be that an artifact with hash 95683bb088e35743966d1ea8b242c2694b57155c8084a406b29aecd81b4b6c92
would get installed at ~/.julia/artifacts/95683bb088e35743966d1ea8b242c2694b57155c8084a406b29aecd81b4b6c92
. However, there a few issues with that:
So the obvious solution is to use the name of the artifact with a slug derived from its content hash, so something like this:
~/.julia/artifacts/libfoo/Z94Fh
There are a few problems with that though:
name
entry that can be different from the artifact section name and determines the part that goes before the slug.Neither issue seems fatal, so I think that's probably what we should do.
I like
Artifacts.toml
, and I wouldn't allow that stuff to also appear inProject.toml
. I generally think for something like that it is better to not offer choices, it just gets confusing, and then one also needs to support all these different options in all the tools, and I just don't think it is worth the extra effort.
I think you're probably right. There's one other reason for a separate file that just occurred to me—this file needs to be parsed at runtime in order to find artifacts, so we want it to be pretty simple and regular and not have to look at a lot of different options or variations. This is similar to how code loading needs to parse through the manifest file, so the scheme for finding code needs to be fast and simple for that. Similarly, artifact finding needs to be fast and simple and I think that suggests a separate file.
Could there be a "fallback" script option for an artifact that is invoked if there is no binary for the current platform?
That's certainly a possibility. I'd prefer to only do this if it turns out to be necessary, but it might.
I assume artifact acquisition would just happen during
build
?
I think it would happen before after installation and before build. After all, it shouldn't depend on the build at all since it's just a matter of installing some thing and putting them in the right place, and that way if there is a build step it can rely on artifacts already being present.
Just to spell this out, the artifact loading process is that when Julia sees artifact"dataset-A"
in the code of a package, it looks for Artifacts.toml
in that package root and looks for a top-level stanza named dataset-A
. Assuming that this is a table, i.e. [dataset-A]
, it then looks for the hash of the artifact in that table and then looks for $depot/artifacts/dataset-A/$(slug(hash))
and returns that path. For artifact"libfoo"
where presumably one finds a series of [[libfoo]]
stanzas instead, the process is one keeps looking through these and until one finds one where the sys.os
and sys.arch
and such selectors match the current system. Once a matching stanza is found, one again looks up $depot/artifacts/libfoo/$(slug(hash))
.
What if one wants to build a binary locally instead of using the one provided by BinaryBuilder? Would they need to edit the Artifacts.toml
in order to be able to load their binaries?
Could there be a "fallback" script option for an artifact that is invoked if there is no binary for the current platform?
I don't like this, because it implicitly makes wherever this artifact should live mutable, and we don't like that. I think mutable state package state should be something else; perhaps some kind of "workspace" API that could define lifecycles for data that are longer than just the lifecycle of a project.
For most of these kinds of projects, what you truly want is a DSL to express data processing flows that allows for arbitrary steps (e.g. if input files A
, B
or C
change, then regenerate D
, which would then also cause E
to be regenerated, etc...) similar to a Makefile
. That would be best provided by a package that builds on top of a filesystem organizational system; perhaps provided by Pkg
, or perhaps not and just manually set up by you. I think the need for those kinds of processes is outside the scope of Pkg
.
It also occurs to me that unlike packages which are fairly self-documenting, artifacts are just blobs of data, so we may want to add an extra layer of information identifying artifacts. Maybe this:
~/.julia/artifacts/
libfoo/
Z94Fh/
Artifact.toml # info about the artifact: full hash, origin URL, system info
content/
# actual files go here
For most of these kinds of projects, what you truly want is a DSL to express data processing flows that allows for arbitrary steps (e.g. if input files A, B or C change, then regenerate D, which would then also cause E to be regenerated, etc...) similar to a Makefile.
We already have one half-baked, poorly documented make
replacement in BinDeps, please let's not add another. I just want an "if binary isn't available, please run myfallback(installpath)
…"
it implicitly makes wherever this artifact should live mutable
Why can't the fallback install to the same location?
Why can't the fallback install to the same location?
Because we're trying to make this immutable and content-addressable. In particular, think about having platform-specific artifacts living together on a shared file system.
Any kind of "run arbitrary code as a fallback to binaries not being available" is, in my mind, a step backwards. The reason I say that this introduces mutable state, is because if all we are doing is downloading and unpacking a tarball, that's a one-step process. Excluding the small possibility that something goes wrong mid-extraction (not something I have seen very often), the files are either there, or they are not. With an arbitrary code fallback, the state of the build directory very often causes problems when the build tries to be run a second time. I have to address an issue with SpecialFunctions.jl
about once every ten days, where a user contacts me because something isn't working and it is always, without fail, due to a previous fallback invocation messing something up for future fallback invocations. Even worse, the number of things that can go wrong in that case are many, many times larger than the number of things that can go wrong when downloading and extracting something.
I am also not eager to support arbitrary execution fallbacks. The whole point of this BinaryBuilder endeavor is to make it so that all you ever have to do anywhere is unpack some files.
I think the fallback would clearly be hardly used. Presumably almost all uses of build.jl
would just disappear. But there are platforms where there is no binary, and without a fallback there is currently no good story for those cases, as far as I can tell. So I have a hard time seeing "a step backwards", because I'm imagining that this would only kick in if the normal artifact procedure didn't work for a rare platform. At that point nothing works, so it is difficult to see how a fallback could make things worse.
But there are platforms where there is no binary, and without a fallback there is currently no good story for those cases, as far as I can tell.
Specifically, what kinds of platforms are you speaking of?
At that point nothing works, so it is difficult to see how a fallback could make things worse.
In the SpecialFunctions case (not to harp on that package in particular, but just because it's the first example that comes to mind), all the complaints I get are from users who should be using the BB-built tarballs, but have somehow managed to force themselves to use the fallback. My most common piece of advice is to just delete the entire deps/usr
directory, and when they try to build
again, it all just works.
I think when you give package authors the ability to embed arbitrary Julia code into their build process, it is very difficult for them to avoid the temptation to use it to solve minor problems, which then transform into major problems. I don't blame them for trying to solve problems; I blame us for giving them inadequate tools.
Even after merging this, Pkg.build()
will still work. If you try to install a package with an Artifact.toml
that does not include your platform, there will simply be no artifact installed. You could write your own deps/build.jl
to detect that situation and run whatever script you want then.
But there are platforms where there is no binary, and without a fallback there is currently no good story for those cases, as far as I can tell.
The story is: add BinaryBuilder support for that platform.
At that point nothing works, so it is difficult to see how a fallback could make things worse.
Because now we have to support a complex, hardly used fallback mechanism...
Maybe we could do something like if no platform variant exists, just look for
~/.julia/artifacts/libfoo/fallback
and if that exists, use that instead of failing. I would want to leave it entirely up to the end user in such situations to figure out how to put something there that works though.
Maybe we could do something like if no platform variant exists, just look for...
I think that will tie in with our answer to @giordano's question above as well:
What if one wants to build a binary locally instead of using the one provided by BinaryBuilder? Would they need to edit the Artifacts.toml in order to be able to load their binaries?
I think ideally no, what I would want is for you to do something like say pkg> dev LibFoo_jll
, then go to ~/.julia/dev/LibFoo_jll/deps/usr
, plop your libraries into the lib
directory and call it good. But this is, of course, breaking the "warranty is void if broken" sticker; at that point you're on your own if the libraries do or don't work.
I mostly remember users from obscure large cluster systems?
Maybe we could do something like if no platform variant exists, just look for
~/.julia/artifacts/libfoo/fallback
I like that! I think that is easier to handle than deving
things and then copying stuff over. Why require the extra dev
step?
I like that! I think that is easier to handle than deving things and then copying stuff over. Why require the extra dev step?
The dev
step is necessary to denote to the resolver that you don't want this package to participate in things like pkg> upgrade
events. You want the rest of Pkg
to just ignore it and not touch it (at least for this environment; perhaps a different environment should use the BB-sourced binaries).
Additionally, we don't really want people adding/removing things from ~/.julia/packages
or ~/.julia/artifacts
, as those are "managed" by Pkg and should be considered read-only.
I have a great feeling about the direction the julia dependencies (immutability, central management by Pkg) are evolving. And I hold my thumbs for Tom Short's shoot on slimmed-down AOT static compilation -- JuliaLang/Julia#32273. My concern now is, if we have all the infrastructure in Pkg and have a "no-sysimg use-so-libs" static compilation, will it be only a matter of defining new "static" architecture (like "i686-so") and rerunning the BB for the static library generation? So at the end if I change my project to "static-library" architecture, the Pkg resolver will download all the "static-library" artifacts?
Key pointss from discussion on slack:
Artifact objects are only understood really by there maching _JLL
packages.
Their identity
It is a strong assumption of the system that any file being distributed by the artifact system what handcrafted expressly for this purpose.
This is probably good for keeping scope constrained. (and it isn't like e.g. DataDeps, or BinDeps is going to stop working so...)
Thus there is no support for arbitary post processing.
It is either extract (which will likely only support tar.gz
) or do not extract.
Do not extract is the default, but for anything created by BinaryBuilder, extract will be used.
There is no support for hash's other than SHA256.
(So can't use say MD5 hash's provided by others.)
The hash is also used for the identity of the artifact object from within the _jll
package.
From outside the _jll
package, then the _jll
package itself is the identity of the object.
Thus when talking about version of a binary dependency (or data data versions),
one is talking about versions (and UUIDs) of _jll
packages.
But when a _jll
package is sorting out downloading data,
then it is saying to the server "Give me the object with this SHA256 hash".
Supporting multiple URLs for mirroring purposes may be a thing. URLs are intended as "advisory" and are only used for unregistered packages. During registration @StefanKarpinski wants to actually rehost the files somewhere else. At least for the General registry.
Let's talk about the possible merging of BinaryProvider and Pkg, to integrate the binary installation story to unheard-of levels. Whereas:
I suggest that we do away with the weird indirection we currently have with packages using
build.jl
files to download tarballs, and instead integrate these downloads into Pkg completely. This implies that we:Create a new concept within Pkg, that of a Binary Artifact. The main difference between a Binary Artifact and a Package is that Packages are platform-independent, Binary Artifacts are necessarily not so. We would need to load over the same kind of platform-matching code as is in BP right now, e.g. dynamically choosing the most specific matching tarball based on the currently running Julia. (See
choose_download()
within BP for more).Modify BinaryBuilder output to generate Binary Artifacts that are then directly imported into the General Registry. The Binary Artifacts contain within them a small amount of Julia code; things like setting environment variables, mappings from
LibraryProduct
to actual.so
file, functions to run anExecutableProduct
, etc... This is all auto-generated by BinaryBuilder.Change client packages to simply declare a dependency upon these Binary Artifacts when they require a library. E.g.
FLAC.jl
would declare a dependency uponFLAC_jll
, which itself declares a dependency uponOgg_jll
, and so on and so forth.Eliminate the
Pkg.build()
step for these packages, as the build will be completed by the end of the download step. (We can actually just bake thedeps.jl
file into the Binary Artifact, as we are using relative paths anyway)Please discuss.