Closed StefanKarpinski closed 11 years ago
I personally think both modules should be loaded by default.
I don't think that is too much to load by default. I would love to have a completely MIT/BSD licensed system (ignoring the issue of readline for the time being) - and for that purpose, FFTW and GLPK would end up being packages. We can probably try to use the linprog written by @mlubin for the package manager.
It should be fairly easy to move sparse.jl into base, and keep the rest of the suitesparse stuff as a package.
@dmbates I can move sparse.jl to base once you give a green signal.
BTW, it should be easy to implement a naive simplex method for the use of the package manager. It would be nice if the package manager was self-contained, and did not depend on external libraries.
Here is one possible implementation that we can translate quickly, assuming it meets all of the requirements of Pkg
:
http://kom.aau.dk/~borre/matlab/strang/linprog.m
@ViralBShah Moving sparse.jl into base is fine with me. For the most part I am working on suitesparse.jl and arpack.jl
I would recommend against a naive simplex implementation. It really takes a lot of work to make the implementation numerically stable. The particular code you linked to has a number of issues (for example, explicitly maintaining the (dense) inverse of a (sparse) matrix!). @IainNZ and I are considering writing a real interface to Clp in the near future, which should really be used for the default linprog implementation.
I second that notion. Making a "toy" implementation is easy, but it gets nasty quick...
CLP is problematic as a standard component of Julia base because the EPL license is incompatible with the GPL (according to Wikipedia's EPL entry). Although it wouldn't affect the core libjulia
functionality, shipping the repl code or binary, which is GPL because of readline in a way that depends on CLP might not be legal.
Would anyone be interested in doing a high-quality, pure-Julia simplex implementation?
Where can I look to get a sense of the size and structure of the LPs that need to be solved here?
It might be significantly easier to get a simplex implementation working that is good enough for usage only by the package manager since the class of problems it generates is quite limited. That at least decouples the base Julia distribution from any external linear programming package. It only needs to be good enough for this. Note that this uses dense matrices although it might eventually be better to use sparse ones.
The relevant dimension sizes in that code are:
reqs
: the required package versions explicitly provided in the REQUIRE
file – no more than hundredspkgs
: the available package names – in the near future, hundreds, eventually possibly tens of thousandsvers
: all non-obsolete versions of all packages – an order of magnitude bigger than pkgs
probablydeps
: dependency edges between package versions – an order of magnitude bigger than vers
probablyThese can likely all be pared down massively by restricting the problem only to packages and versions that are reachable somehow by the allowable root set of versions that satisfy the explicitly requested set of packages.
Could you give an example of how to call resolve() with typical data so I can play around with a real problem?
It sounds like the problem solved on line 178 could have O(deps) = a million constraints. Is that correct?
The second problem looks suspiciously like a network flow problem, for which there exists a specialization of the simplex method which is a good 50x more efficient. It's hard to tell though. Could you explain a bit more about what the constraints and variables represent?
In the realistic near future, these are way smaller than that, but yes, eventually deps could get pretty big and you'd want it to be sparse and to have a pretty scalable algorithm. Currently this is what we have:
julia> pkgs = Pkg.Metadata.packages()
4-element String Array:
"ArgParse"
"Example"
"Options"
"TextWrap"
julia> vers = Pkg.Metadata.versions(pkgs)
4-element Version Array:
Version("ArgParse",v"0.0.0")
Version("Example",v"0.1.0")
Version("Options",v"0.0.0")
Version("TextWrap",v"0.0.0")
julia> deps = Pkg.Metadata.dependencies(pkgs,vers)
3-element (Version,VersionSet) Array:
(Version("ArgParse",v"0.0.0"),VersionSet("Options",[]))
(Version("ArgParse",v"0.0.0"),VersionSet("TextWrap",[]))
(Version("TextWrap",v"0.0.0"),VersionSet("Options",[]))
You can run the above your self after doing require("pkg.jl"); Pkg.init()
to setup a package repo for yourself.
I'll explain the problem a bit more later. Gotta get lunch now...
From a fresh pull:
julia> require("pkg.jl")
julia> Pkg.init()
Initialized empty Git repository in /home/mlubin/.julia/.git/
Cloning into 'METADATA'...
remote: Counting objects: 35, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 35 (delta 2), reused 35 (delta 2)
Unpacking objects: 100% (35/35), done.
[master (root-commit) 2f59b70] [jul] METADATA
2 files changed, 4 insertions(+)
create mode 100644 .gitmodules
create mode 160000 METADATA
create mode 100644 REQUIRE
julia> pkgs = Pkg.Metadata.packages()
fatal: Not a git repository: 'METADATA/.git'
0-element String Array
Tangentially related: I suppose GLPK should become a module (whatever its fate is going to be). Not sure about linprog (which btw I should document...). Opinions?
Also, I do have a preliminary Clp wrapper in a branch (it's just a couple of commits, it misses work on the Makefile and a patch which is necessary to make it work from Julia), but if the license is even more problematic than GPL it's probably not worth considering it relative to the present issue (anyone interested in completing it can play with it or contact me though).
@mlubin: I seem to recall that the package manager LP:s contain only zeros and ones in the A matrix. (Or perhaps twos or threes too) It seems to me that it would be hard to pose a really ill-conditioned problem with that kind of quantization. Would there still be the same need to get it numerically stable?
From what I remember when I looked at the code, if I interpreted it correctly:
x <= y1 + ... + yn
.x1 + ... + xn <= 1
to use only one of them.x = 1
.I'm not sure if that can be cast as a flow problem, but I agree that it looks pretty close.
Eventually, we will want to find the connected components of the dependency graph and restrict the solution to only the relevant subgraph.
Yes, @toivoh has it correct. (Well, close enough: required packages where the requirement is satisfied by any of x1 ... xn have x1 + ... + xn >= 1
, which forces at least one of them to be installed.) The weights for the optimization function need to be chosen carefully so that fractionally solutions are never optimal. That's what the first linear programming problem is for. I agree that it seems unlikely for problems constructed like this to be ill-conditioned.
@carlobaldassi The GLPK interface should be in a module for sure, and linprog can have its own module - making it easy to do things such as replacing linprog-glpk with linprog-clp. We can decide the fate of GLPK and various other libraries when we do the package reorganization and cleaning of Base.
Is the purpose of the objective entirely to force the LP solution to be integer? (So actually you're interested in any integer solution to the 2nd LP?) What do the constraints in the 1st LP represent?
The weight optimization (i.e. 1st LP) enforces two constraints:
This favors installing packages that are depended on by a lot of other packages, and newer package versions over older ones. The doubling of cost for each older version seems to be necessary to ensure that the solution to the 2nd LP problem is integral. It would be nice to get rid of the doubling, since it can fairly rapidly run out of the Float64
range, but this was the best I could manage. I figured we could figure something clever out at some point later if it came to that.
From a fresh pull...
Oh, sorry, yeah, Pkg.Metadata
isn't really an externally facing API: you have to manually cd into ~/.julia
before running those commands.
Large objective values (1e10) can cause instability even if all of the coefficients of the constraints are nice, so that's slightly concerning, but it sounds like the versions considered can be cut down a lot with some preprocessing. I'll look at writing a linprog interface to my jlSimplex code.
@StefanKarpinski: When you say that fractional solutions should not be optimal, is it enough that the solution is a vertex? (Because I think simplex always gives you one) Or do you mean that there could be vertices at fractional coordinates that become optimal with the wrong weight?
I'm not sure it would solve all problems, but note that in linprog.jl there's a mixintprog
function which can be used to restrict the space of solutions to integers.
(I'll work on modularizing GLPK and linprog.)
In case anyone's interested, I've just created a stand-alone repo for the Clp wrapper, which will eventually become a package at some point (hopefully).
We're been focused on the simplex algorithm, but actually it's much easier to write a reasonable implementation of the primal-dual interior-point method.
It looks like sparse was moved to base. The others aren't needed by Pkg anymore, IIRC, and should eventually be made into their own packages? So can this be closed as resolved?
@carlobaldassi @mlubin Could you guys move GLPK to a package too, like you did with CLP? We could then close this one.
I started moving GLPK to a package. It works, but I'm not sure how to test it on OSX and Windows. I'll probably ask for help in a short while.
@carlobaldassi I can get you remote access to a windows VM
@carlobaldassi, there's many people here with OSX machines, myself included. I can run any tests you need, if that would help you.
@loladiro , @staticfloat : thanks, I'll open a thread on the dev list. A Windows VM would probably help.
They have been moved directly out of the julia repo into packages.
The major blocker here is sparse because I don't want to screw up the (re)merge for all of @ViralBShah and @dmbates work on sparse matrices. These modules need to go into base and get loaded by default so that the Pkg module, which relies on them, can get loaded by default so that people can add and remove packages without having to do anything. Or maybe that's too much stuff to load by default?