JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.72k stars 5.48k forks source link

move sparse, glpk & linprog into base #1518

Closed StefanKarpinski closed 11 years ago

StefanKarpinski commented 11 years ago

The major blocker here is sparse because I don't want to screw up the (re)merge for all of @ViralBShah and @dmbates work on sparse matrices. These modules need to go into base and get loaded by default so that the Pkg module, which relies on them, can get loaded by default so that people can add and remove packages without having to do anything. Or maybe that's too much stuff to load by default?

johnmyleswhite commented 11 years ago

I personally think both modules should be loaded by default.

ViralBShah commented 11 years ago

I don't think that is too much to load by default. I would love to have a completely MIT/BSD licensed system (ignoring the issue of readline for the time being) - and for that purpose, FFTW and GLPK would end up being packages. We can probably try to use the linprog written by @mlubin for the package manager.

It should be fairly easy to move sparse.jl into base, and keep the rest of the suitesparse stuff as a package.

@dmbates I can move sparse.jl to base once you give a green signal.

ViralBShah commented 11 years ago

BTW, it should be easy to implement a naive simplex method for the use of the package manager. It would be nice if the package manager was self-contained, and did not depend on external libraries.

Here is one possible implementation that we can translate quickly, assuming it meets all of the requirements of Pkg: http://kom.aau.dk/~borre/matlab/strang/linprog.m

dmbates commented 11 years ago

@ViralBShah Moving sparse.jl into base is fine with me. For the most part I am working on suitesparse.jl and arpack.jl

mlubin commented 11 years ago

I would recommend against a naive simplex implementation. It really takes a lot of work to make the implementation numerically stable. The particular code you linked to has a number of issues (for example, explicitly maintaining the (dense) inverse of a (sparse) matrix!). @IainNZ and I are considering writing a real interface to Clp in the near future, which should really be used for the default linprog implementation.

IainNZ commented 11 years ago

I second that notion. Making a "toy" implementation is easy, but it gets nasty quick...

StefanKarpinski commented 11 years ago

CLP is problematic as a standard component of Julia base because the EPL license is incompatible with the GPL (according to Wikipedia's EPL entry). Although it wouldn't affect the core libjulia functionality, shipping the repl code or binary, which is GPL because of readline in a way that depends on CLP might not be legal.

StefanKarpinski commented 11 years ago

Would anyone be interested in doing a high-quality, pure-Julia simplex implementation?

mlubin commented 11 years ago

Where can I look to get a sense of the size and structure of the LPs that need to be solved here?

StefanKarpinski commented 11 years ago

It might be significantly easier to get a simplex implementation working that is good enough for usage only by the package manager since the class of problems it generates is quite limited. That at least decouples the base Julia distribution from any external linear programming package. It only needs to be good enough for this. Note that this uses dense matrices although it might eventually be better to use sparse ones.

StefanKarpinski commented 11 years ago

The relevant dimension sizes in that code are:

These can likely all be pared down massively by restricting the problem only to packages and versions that are reachable somehow by the allowable root set of versions that satisfy the explicitly requested set of packages.

mlubin commented 11 years ago

Could you give an example of how to call resolve() with typical data so I can play around with a real problem?

It sounds like the problem solved on line 178 could have O(deps) = a million constraints. Is that correct?

mlubin commented 11 years ago

The second problem looks suspiciously like a network flow problem, for which there exists a specialization of the simplex method which is a good 50x more efficient. It's hard to tell though. Could you explain a bit more about what the constraints and variables represent?

StefanKarpinski commented 11 years ago

In the realistic near future, these are way smaller than that, but yes, eventually deps could get pretty big and you'd want it to be sparse and to have a pretty scalable algorithm. Currently this is what we have:

julia> pkgs = Pkg.Metadata.packages()
4-element String Array:
 "ArgParse"
 "Example" 
 "Options" 
 "TextWrap"

julia> vers = Pkg.Metadata.versions(pkgs)
4-element Version Array:
 Version("ArgParse",v"0.0.0")
 Version("Example",v"0.1.0") 
 Version("Options",v"0.0.0") 
 Version("TextWrap",v"0.0.0")

julia> deps = Pkg.Metadata.dependencies(pkgs,vers)
3-element (Version,VersionSet) Array:
 (Version("ArgParse",v"0.0.0"),VersionSet("Options",[])) 
 (Version("ArgParse",v"0.0.0"),VersionSet("TextWrap",[]))
 (Version("TextWrap",v"0.0.0"),VersionSet("Options",[])) 

You can run the above your self after doing require("pkg.jl"); Pkg.init() to setup a package repo for yourself.

StefanKarpinski commented 11 years ago

I'll explain the problem a bit more later. Gotta get lunch now...

mlubin commented 11 years ago

From a fresh pull:

julia> require("pkg.jl")

julia> Pkg.init()
Initialized empty Git repository in /home/mlubin/.julia/.git/
Cloning into 'METADATA'...
remote: Counting objects: 35, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 35 (delta 2), reused 35 (delta 2)
Unpacking objects: 100% (35/35), done.
[master (root-commit) 2f59b70] [jul] METADATA
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 METADATA
 create mode 100644 REQUIRE

julia> pkgs = Pkg.Metadata.packages()
fatal: Not a git repository: 'METADATA/.git'
0-element String Array
carlobaldassi commented 11 years ago

Tangentially related: I suppose GLPK should become a module (whatever its fate is going to be). Not sure about linprog (which btw I should document...). Opinions?

Also, I do have a preliminary Clp wrapper in a branch (it's just a couple of commits, it misses work on the Makefile and a patch which is necessary to make it work from Julia), but if the license is even more problematic than GPL it's probably not worth considering it relative to the present issue (anyone interested in completing it can play with it or contact me though).

toivoh commented 11 years ago

@mlubin: I seem to recall that the package manager LP:s contain only zeros and ones in the A matrix. (Or perhaps twos or threes too) It seems to me that it would be hard to pose a really ill-conditioned problem with that kind of quantization. Would there still be the same need to get it numerically stable?

From what I remember when I looked at the code, if I interpreted it correctly:

I'm not sure if that can be cast as a flow problem, but I agree that it looks pretty close.

ViralBShah commented 11 years ago

Eventually, we will want to find the connected components of the dependency graph and restrict the solution to only the relevant subgraph.

StefanKarpinski commented 11 years ago

Yes, @toivoh has it correct. (Well, close enough: required packages where the requirement is satisfied by any of x1 ... xn have x1 + ... + xn >= 1, which forces at least one of them to be installed.) The weights for the optimization function need to be chosen carefully so that fractionally solutions are never optimal. That's what the first linear programming problem is for. I agree that it seems unlikely for problems constructed like this to be ill-conditioned.

ViralBShah commented 11 years ago

@carlobaldassi The GLPK interface should be in a module for sure, and linprog can have its own module - making it easy to do things such as replacing linprog-glpk with linprog-clp. We can decide the fate of GLPK and various other libraries when we do the package reorganization and cleaning of Base.

mlubin commented 11 years ago

Is the purpose of the objective entirely to force the LP solution to be integer? (So actually you're interested in any integer solution to the 2nd LP?) What do the constraints in the 1st LP represent?

StefanKarpinski commented 11 years ago

The weight optimization (i.e. 1st LP) enforces two constraints:

This favors installing packages that are depended on by a lot of other packages, and newer package versions over older ones. The doubling of cost for each older version seems to be necessary to ensure that the solution to the 2nd LP problem is integral. It would be nice to get rid of the doubling, since it can fairly rapidly run out of the Float64 range, but this was the best I could manage. I figured we could figure something clever out at some point later if it came to that.

StefanKarpinski commented 11 years ago

From a fresh pull...

Oh, sorry, yeah, Pkg.Metadata isn't really an externally facing API: you have to manually cd into ~/.julia before running those commands.

mlubin commented 11 years ago

Large objective values (1e10) can cause instability even if all of the coefficients of the constraints are nice, so that's slightly concerning, but it sounds like the versions considered can be cut down a lot with some preprocessing. I'll look at writing a linprog interface to my jlSimplex code.

toivoh commented 11 years ago

@StefanKarpinski: When you say that fractional solutions should not be optimal, is it enough that the solution is a vertex? (Because I think simplex always gives you one) Or do you mean that there could be vertices at fractional coordinates that become optimal with the wrong weight?

carlobaldassi commented 11 years ago

I'm not sure it would solve all problems, but note that in linprog.jl there's a mixintprog function which can be used to restrict the space of solutions to integers.

(I'll work on modularizing GLPK and linprog.)

carlobaldassi commented 11 years ago

In case anyone's interested, I've just created a stand-alone repo for the Clp wrapper, which will eventually become a package at some point (hopefully).

mlubin commented 11 years ago

We're been focused on the simplex algorithm, but actually it's much easier to write a reasonable implementation of the primal-dual interior-point method.

vtjnash commented 11 years ago

It looks like sparse was moved to base. The others aren't needed by Pkg anymore, IIRC, and should eventually be made into their own packages? So can this be closed as resolved?

ViralBShah commented 11 years ago

@carlobaldassi @mlubin Could you guys move GLPK to a package too, like you did with CLP? We could then close this one.

carlobaldassi commented 11 years ago

I started moving GLPK to a package. It works, but I'm not sure how to test it on OSX and Windows. I'll probably ask for help in a short while.

Keno commented 11 years ago

@carlobaldassi I can get you remote access to a windows VM

staticfloat commented 11 years ago

@carlobaldassi, there's many people here with OSX machines, myself included. I can run any tests you need, if that would help you.

carlobaldassi commented 11 years ago

@loladiro , @staticfloat : thanks, I'll open a thread on the dev list. A Windows VM would probably help.

StefanKarpinski commented 11 years ago

They have been moved directly out of the julia repo into packages.