JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.62k stars 5.48k forks source link

Review BinDeps: Handling binary dependencies in Pkg #3088

Closed ViralBShah closed 11 years ago

ViralBShah commented 11 years ago

We should review the BinDeps approach, as it has turned out to be quite complex - even though many of the troubles faced by people are probably related to pkg rather than BinDeps.

One possibility that @StefanKarpinski has mentioned is that we only support two options, download a pre-compiled binary, or use a system provided one (through various package managers on Linux and Mac). Users should not be expected to build libraries from source.

In such a scenario, we still need to have the ability to build libraries on all platforms, and perhaps a simple Makefile with common targets (similar to what we have in deps) may suffice.

ViralBShah commented 11 years ago

Cc: @loladiro

Keno commented 11 years ago

We should discuss this in the upcoming IRC sprint. I have a couple of ideas.

vtjnash commented 11 years ago

I also have some ideas to discuss tomorrow (someone want to send out an announcement for that today?)

mlubin commented 11 years ago

There are many smaller libraries that aren't provided by package manages or for some reason need to be patched to be used in Julia. Building from source is really the only option, and it's too much to expect all package mantainers to build binaries for all platforms without providing any infrastructure to make this easy.

vtjnash commented 11 years ago

Will discuss more over IRC tomorrow. However, my summary is that that BinDeps should kept but be improved for better reliability and to start a parallel package for interfacing with the system package manager so that packages can also easily declare all system dependencies.

vharavy commented 11 years ago

Just as an idea. What if we reimplement GYP in Julia language? Package developers will be able to define how to build package and its dependencies in common way. During the installation we will generate appropriate build scheme (GNU Makefiles, NMake Makefiles, Visual Studio Project Files) and install dependencies in a way native to the OS (APT, RPM, MSI packages). It is not easy task, but I believe it can solve all problems with package installation. Also, GYP is under New BSD license, so we can use its code as a starting point.

StefanKarpinski commented 11 years ago

@mlubin: what's an example of a library that needs to be patched? To expand on my proposal of system-installed or precompile binary as the only two options, I think that precompiled binaries are the real solution. However, various distributions/people seem to insist on being able to use their own versions of things, rather than some precompiled binary that Just Works™. In that case, the onus is on the distribution/person insisting on using their own – they either need to compile from source or package things so that it works. This business of having a complex system like BinDeps that never actually seems to work is not feasible.

For what it's worth, while the current Pkg is somewhat fragile (I've been trying to focus on making the next incarnation better, rather than bandaiding the older one), it works fine for source-only packages. It's the BinDeps/Pkg interaction that causes most of the problems at this point. Rather than trying to solve an intractable problem – supporting automatic interaction with every single way of installing every possible library ever – my proposal is to not do that.

JeffBezanson commented 11 years ago

Many users (especially as we move out of the early-adopter phase) do not have compilers installed, and are unwilling to do more than 1 or 2 steps to install something. Thus if some package genuinely requires building something from source, it is out of their reach unless we build it for them.

One of the most realistic things we can do, as @mlubin says, is set up infrastructure to automatically build package dependencies and make binaries available. Although that takes work, it is at least possible, unlike other things we might do like require people to have compilers.

ViralBShah commented 11 years ago

Users need to be able to get packages and precompiled libraries without fuss.

However, in order to make precompiled libraries available for users, we need to provide a way for package developers to specify how to build the libraries on Mac, Linux, and Windows. This is also essential for package developers to get other contributors for their packages.

mlubin commented 11 years ago

@StefanKarpinski: I've needed to add some patches to Clp and CoinMP for various reasons, and it will be a while before the fixes show up in versions packaged by Ubuntu. You guys are right that precompiled libraries are the right way to go in general, but it should be easy for users to plug in their own version of a library that's either manually compiled or from installed from an external package manager.

Many optimization codes, for example, have compile-time options to link to advanced sparse linear algebra routines that cannot be distributed because of licensing issues. It should be easy for users who already know how to compile these codes to have Julia use manually compiled versions, without kludges like setting package-defined environmental variables. Perhaps asking the user to create a symlink to the library in some subdirectory of the Julia package would work.

ViralBShah commented 11 years ago

Yes, we should certainly let the user provide libraries. We should just not expect all users to build libraries.

staticfloat commented 11 years ago

This can be pretty complex.

One one side of the spectrum, you have things like Homebrew's Bottles, which are essentially tar'ed versions of whatever gets thrown into /usr/local/ by a formula. There is virtually no documentation, or even standard that these objects conform to, you just untar it into /usr/local/ and run with it. I'm contrasting this with the process of creating a Debian package, which has everything from licenses of every file in the package to what symbols are defined in any shared libraries built, all documented alongside the source in the debian/ directory.

Homebrew bottles are compiled for very select formulae by the Homebrew devs, and uploaded to sourceforge. Debian packages are compiled on buildd servers according to the rules defined in the debian/ directory, and then uploaded to a PPA/repository.

I think the amount of metadata we ask users to include is pretty flexible. (E.g. do we always make the users include directions for building a package from source? Do we include names of system-provided equivalent libraries when the user does not wish to compile zlib again? etc....) What is less easily done is creating a streamlined process for compiling and distributing packages. Building even a shadow of a buildd analogue would be a major undertaking.

In short, I think it's going to be a major timesink to have "us" (where "us" means any computers owned by the Julia project) compile packages submitted by the community. Especially when you have packages that can be compiled against system-provided libraries.

IMO, the simplest thing for us to do is to attempt to support system-provided libraries first and foremost. Instructions for linking against custom-compiled libraries/packages can be documented, but most users will likely want something that can "just work" as quickly as possible, and using apt-get (for Debian systems), brew or port for OSX, and otherwise attempting to push as much work as we can off onto other projects that are already tackling this problem is the easiest way to do that. If a package cannot be used with a system-provided library, (e.g. the difficulties we've had with Julia, needing newer versions of LLVM, OpenBLAS, etc....) we can tackle solutions on a case-by-case basis. (Getting newer versions into the proper packaging managers, moving to a pure-julia solution (haha, easy for me to say), requiring compilation from source, etc....)

EDIT:

@StefanKarpinski: I've needed to add some patches to Clp and CoinMP for various reasons, and it will be a while before the fixes show up in versions packaged by Ubuntu.

@mlubin: Inclusion into ubuntu is desired, definitely not required for ease of use. If we need to, we can set up a julia-package-dependencies PPA for Ubuntu customers. This only addresses Debian-based systems however.

StefanKarpinski commented 11 years ago

IMO, the simplest thing for us to do is to attempt to support system-provided libraries first and foremost. Instructions for linking against custom-compiled libraries/packages can be documented, but most users will likely want something that can "just work" as quickly as possible, and using apt-get (for Debian systems), brew or port for OSX, and otherwise attempting to push as much work as we can off onto other projects that are already tackling this problem is the easiest way to do that.

In my experience, this is definitely not the easiest way to do this. In fact, my experience suggests that making this work for everyone everywhere is basically impossible. On the other hand, providing pre-compiled binaries for important libraries on the half-dozen systems that we support is pretty straight-forward.

JeffBezanson commented 11 years ago

If there are existing pre-built things that can be installed on any mac (e.g. by downloading a tarball), then we can use those. Other than that, most users don't have brew or port either, and have no idea what they are.

mlubin commented 11 years ago

I think we can separate the core libraries such as graphics stuff from small scientific libraries. Do we need the same strategy for both? Most users don't have compilers, but it's not so crazy to require them for more niche libraries, which is what I believe both R and pip do.

johnmyleswhite commented 11 years ago

To my knowledge, R does not require the user to install a compiler: highly generic binaries are built on a standard server and distributed with R packages.

-- John

On May 13, 2013, at 6:12 PM, Miles Lubin notifications@github.com wrote:

I think we can separate the core libraries such as graphics stuff from small scientific libraries. Do we need the same strategy for both? Most users don't have compilers, but it's not so crazy to require them for more niche libraries, which is what I believe both R and pip do.

— Reply to this email directly or view it on GitHub.

vharavy commented 11 years ago

@johnmyleswhite Yes, R does not require the user to install a compiler. It downloads precompiled binaries from CRAN's /windows/contrib or /macos/contrib folders. But, you can download Rtools (about 30 megabytes for both 32-bit and 64-bit Windows) and install necessary headers and libraries that allow you to compile both R from source code on Windows (and much easier that Julia) and develop own packages with binary dependencies (and in a very simple way).

MATLAB ships precompiled binaries for 32-bit and 64-bit Windows, Linux and MacOS with their packages. Third-party toolboxes do the same - doing so is simple with MATLAB.

Keno commented 11 years ago

There is really two questions here that need to be addressed. One is a question of policy, which is the question of what we will distribute and what we are expecting the users to have installed on there machine. And while this is certainly an important question to address, I don't think we can find a comprehensive answer for all use cases. Binary distribution is certainly an option and we are doing that quite successfully on Windows at the moment already. Actually building the binaries can be easily automated and since packages have started using Travis for testing purposes anyway, we might be able to leverage that to build binaries (I have code to add Windows support for Travis, which I'll be working with the Travis people to get up and running).

The other question, which we need to address and which is currently causing the most problem is that of mechanism. The original idea behind the whole BinDeps thing was to separate the building of binary dependencies from the package manager itself, so that users can use whatever policy they want for installing their packages (since we will never be able to predict all the use cases). The problem with the way this worked in 0.1 is more on the implementation side than on the conceptual side. What we did back then was simply include the build.jl file for each package, which works and is simple, but as has become painfully obvious over the last couple of months this interface is not sufficient.

I still believe that this policy/mechanism separation is important and that it should be kept, but there's a question of how to best implement this. In particular, we need to answer the following:

  1. How does the package manager interact with the build script provided by the package?
  2. Once we figure out 1, how can we allow the build script to fail smoothly without breaking the entire package system? Do we provide some kind of rollback? Do we leave the package in the state it failed and somehow let the packe manager know about it? (This is really the issue that caused us the most trouble).
JeffBezanson commented 11 years ago

There is indeed a conceptual problem with letting users pick how to install things, which is that they don't know and don't care which option to pick. On May 13, 2013 8:17 PM, "Keno Fischer" notifications@github.com wrote:

There is really two questions here that need to be addressed. One is a question of policy, which is the question of what we will distribute and what we are expecting the users to have installed on there machine. And while this is certainly an important question to address, I don't think we can find a comprehensive answer for all use cases. Binary distribution is certainly an option and we are doing that quite successfully on Windows at the moment already. Actually building the binaries can be easily automated and since packages have started using Travis for testing purposes anyway, we might be able to leverage that to build binaries (I have code to add Windows support for Travis, which I'll be working with the Travis people to get up and running).

The other question, which we need to address and which is currently causing the most problem is that of mechanism. The original idea behind the whole BinDeps thing was to separate the building of binary dependencies from the package manager itself, so that users can use whatever policy they want for installing their packages (since we will never be able to predict all the use cases). The problem with the way this worked in 0.1 is more on the implementation side than on the conceptual side. What we did back then was simply include the build.jl file for each package, which works and is simple, but as has become painfully obvious over the last couple of months this interface is not sufficient.

I still believe that this policy/mechanism separation is important and that it should be kept, but there's a question of how to best implement this. In particular, we need to answer the following:

  1. How does the package manager interact with the build script provided by the package?
  2. Once we figure out 1, how can we allow the build script to fail smoothly without breaking the entire package system? Do we provide some kind of rollback? Do we leave the package in the state it failed and somehow let the packe manager know about it? (This is really the issue that caused us the most trouble).

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/3088#issuecomment-17849529 .

StefanKarpinski commented 11 years ago

What I'm advocating is basically this: when installing a package that requires some shared library, if the library loads and works as-is, just use the system version; otherwise download a pre-compiled version into ~/.julia/lib/. End of story.

JeffBezanson commented 11 years ago

I agree with Stefan 100%.

If people want to add hooks to make a package work with a custom environment, that's fine as long as it doesn't interfere with normal operation. Examples might be adding a search path for detecting installed libraries on more systems, or providing some build scripts that can optionally be used to build your own version of a binary dep. All we need is for the package manager not to fail due to missing binary deps.

vtjnash commented 11 years ago

I thought we were going to discuss over IRC tomorrow. Anyways, I'll go on the record as saying I'm also pretty much in agreement with Stefan.

However, it's also really easy to ask apt-get, yum, and similar such linux package manager "which package would I need to install to ensure that I have a working copy of libsomething.so on my system". I like my odds of getting something working quickly with that interface better than trying to setup our own build server. (Plus, I found that the OpenSUSE build service does the same for windows 32 and 64 for over 250 libraries). All that's missing then is a good way to get it working well on Mac.

mlubin commented 11 years ago

I agree with this, but there should be better (built-in) support for library search paths and handling different possible names for an installed library.

JeffBezanson commented 11 years ago

Using existing sources of binary dependencies is probably ok, since it fits the suggested scheme fine, just with people other than us hosting the binaries. Dealing with debian/ubuntu or fedora is easy, but we need something to fall back on for the million other distros, freebsd, mac, etc. There may also be cases where we patch libraries, or require a specific version that a distro doesn't provide.

@mlubin you're right but I'm not sure how to proceed. Is issue #1842 relevant? I'd really like to fix it but I ran out of ideas.

ViralBShah commented 11 years ago

I have thought about this for a while, and it is not difficult to support a cornucopia of distros - you need install/delete commands and the name of the package, and this does not even have to be complete for all supported distros. BinDeps already has preliminary support for this, and we should extend it.

This should just not be part of the Pkg install steps, and failures should not affect Pkg dependencies. We can discuss further on IRC.

staticfloat commented 11 years ago

What I'm advocating is basically this: when installing a package that requires some shared library, if the library loads and works as-is, just use the system version; otherwise download a pre-compiled version into ~/.julia/lib/. End of story.

This seems very reasonable to me, and is a good example of what I was trying to advocate in the first place, sorry if I confused the message somewhat with my long-winded response.

All that's missing then is a good way to get it working well on Mac.

I agree that the lack of a decent binary packaging system on Mac is a big problem. My favorite substitute is brew, but I can't objectively compare it with others as I don't have experience with them. The one feature of brew that could possibly shine here is that one could install a homebrew installation to ~/.julia/brew or wherever, and create formula to download bottles (e.g. binary packages). Excluding the building and uploading of bottles, this is actually pretty easy to do, however it may be easier to simply download libraries straight to~/.julia/lib if we're going through all this trouble anyway.

vtjnash commented 11 years ago

~ isn't a valid path in the installation of a package, so it gets changed to the absolute path /Users/You, which doesn't work so hot for redistributing binaries (e.g. bottles only work if the path stays the same. MacPorts has the same issue, although it never gave its binaries a fancy name)

i find that brew and macports are fine at being managed source code installation tools. however, they are both quite bad at doing binaries, and at answering the question "what do I need to install to get it to work"

StefanKarpinski commented 11 years ago

I think it's pretty obvious that we're all using ~ as a shorthand here.

staticfloat commented 11 years ago

which doesn't work so hot for redistributing binaries (e.g. bottles only work if the path stays the same. MacPorts has the same issue, although it never gave its binaries a fancy name)

Indeed, however there are bottles that do not require a certain Cellar, (I had previously stated the opposite, so if you are quoting me, I'm sorry for misleading you. gmp, libmpc and pidof are all cellar :any formulae) and we certainly ship a Julia that can be unzipped anywhere and it works just fine. The main problem is that anything we do not compile ourselves and manually set the RPATH correctly for will likely have absolute paths baked in. Whether or not we use brew or any other packaging system we need to be aware of this.

Our solution could install_name_tool linker paths to the appropriate installation directory (that would actually be pretty cool if it could be automated, but it's too late in the night for me to think through it properly), or modify the library build processes to use @rpath/ paths all the time, it's really up to us, but this isn't a package-manager-specific problem; it's an OSX shared library problem and just requires a little extra care on our part.

vtjnash commented 11 years ago

This probably a crazy idea, but would it be possible to fix the absolute paths issue using chroot?

# make the chroot environment, and provide access to all the normal stuff
mkdir ~/.julia/chroot
ln -s / ~/.julia/chroot/root
ln -s /etc ~/.julia/chroot/etc
ln -s /usr ~/.julia/chroot/usr
ln -s /Users ~/.julia/chroot/Users
# and so on
# install and configure macports into our chroot folder (homebrew doesn't sufficiently support quartz output)
# install stuff using macports binaries
chroot ~/.julia/chroot ~/.julia/chroot/opt/bin/port install cairo etc.
# run julia
chroot ~/.julia/chroot julia

This only half helps though, since MacPorts sometimes needs Xcode to configure a binary after unpacking it.

I was going off the documentation for homebrew which is apparently a few months out-of-date now. The cellar :any feature appears to be only about 3 months old https://github.com/mxcl/homebrew/commit/82ecb69a53166c8c2fa0e507ac32c86241b66343 and looking at gmp as an example, it still has /usr/local/lib hardcoded into the dynamic library

staticfloat commented 11 years ago

That is an interesting idea..... but would the julia process then need to live in a chroot'ed environment? (Because libraries loaded directly by julia via dlopen could have 2nd tier libraries that are loaded by ldd) That would preclude being able to access the outside world, no?

The other option is to just throw all our libraries into ~/.julia/lib or wherever, then tack $(echo ~)/.julia/lib onto DYLD_FALLBACK_LIBRARY_PATH, for OSX binaries. I don't have a mac right now to test on, but I believe that full-path libraries that cannot be found will be searched for in the fallback path.

vtjnash commented 11 years ago

I think we generally want to avoid setting any of the DYLD_FALLBACK_LIBRARY_PATH variables, since it can cause strange issues (it's come up 2 or 3 times in the julia issues list as breaking everything in mysterious ways). Also, it solves the dylib lookup issue (which is probably more easily solved via install_name_tool as you suggested originally), but not for other absolute paths that may have gotten encoded into the binaries.

It wouldn't be able to access the outside world directly, but I was suggesting providing symlinks to all of the interesting parts of the outside world (/usr, /etc, /dev, /tmp, /sbin, /Volumes, /Applications, /System, /Libraries, /Users, and a few more) so it would pretty much look identical (but whitelisted by folders), and with one extra folder dropped in (/opt/julia). On linux, I would have tried doing this by doing mount --bind / chroot_folder, but I don't think that is possible on mac.

mlubin commented 11 years ago

Has there been any further resolution to this? There's also #3342 which is preventing GLPK from working with multiple libraries installed in different places.

mlubin commented 11 years ago

Another question, what's the procedure for running build scripts with Pkg2?

StefanKarpinski commented 11 years ago

There is none yet. I'm not sure there should be one. We'll see.

mlubin commented 11 years ago

Why not at least have a runbuildscript that can be called manually? Otherwise I don't know how source-based packages can work at all.

Keno commented 11 years ago

I'll try to stub something out and put it in a pull request.

Keno commented 11 years ago

I feel like this is sufficiently addressed with the new BinDeps and Pkg2 integration. Any further issues and features can be addressed directly in BinDeps.

mlubin commented 11 years ago

The base julia issues are resolved with Pkg2, but I think that functionality like being able to update binary dependencies (loladiro/BinDeps.jl#26) should be blocking for 0.2.

StefanKarpinski commented 11 years ago

That seems like a reasonable requirement. @loladiro, any thoughts on that?

Keno commented 11 years ago

The nice thing about having BinDeps in a package is that the release of new features does not have to correspond to Julia releases. I do however agree that we should have this for 0.2, so I'll work on it.