JuliaML / DensityRatioEstimation.jl

Density ratio estimation in Julia
MIT License
33 stars 3 forks source link

Dependency on JuMP and external solvers #11

Closed juliohm closed 4 years ago

juliohm commented 5 years ago

Before jumping into an interface discussion as proposed in #10, I would like to raise another issue regarding package dependencies. I noticed that you are relying on JuMP at the moment to build and solve optimisation problems. I know some of these density ratio estimation methods can be implemented with simple optimisation methods that could be implemented by hand. I wonder if we could implement all these methods without requiring JuMP and a solver like Ipopt? The problem with depending on external solvers is that we loose the ability to do automatic differentiation and other things that could be interesting in the future for other downstream projects like MLJ.jl, Flux.jl, etc.

xukai92 commented 5 years ago

Yes but I think we should support both. For example, for the current MMD/KMM method, the numeircal version via Ipopt is not AD friendly but the analytical version is AD friendly. I also prefer to be pure Julia as much as possible :)

juliohm commented 5 years ago

In terms of dependencies it is really important to be strict about the plans because, for example, I had a very lightweight package in mind with no dependency on JuMP. Could we provide the analytical version only? Is it equivalent? I bought Sigiyama's book but it didn't arrive yet.

juliohm commented 5 years ago

What we could do is make JuMP and Ipot optional dependencies via the Requires.jl package. In this case we provide a set of methods that are pure Julia by default, but when the user loads JuMP, another set of methods becomes available. This could be a good tradeoff, what do you think?

I did this before with some geostatistical solvers in GeoStats.jl, I am familiar with the Requires.jl workflow.

xukai92 commented 5 years ago

They are not equivalent. Both solve the same optimization problem but the analytical one is for an unconstrained version while the numerical one solves the same optimization problem by constraining the ratio to be postive and to be normalised to 1.

I'm happy to use Requires.jl - just a little bit concerned about the potential slow down on loading if that's an issue.

juliohm commented 5 years ago

The loading of DensityRatioEstimation.jl? It shouldn't be affected as far as I know, perhaps it will be faster even becasue JuMP and Ipopt won't be loaded by default? The idea behind Requires.jl is that users control which parts of the package are loaded. If the user starts a session with just DensityRatioEstimation loaded, it has fast load and access to analytical or implemented-by-hand methods. If it later on loads JuMP and a solver, it gets access to other methods that are loaded on the fly.

My point is that we should avoid as much as possible a heavy black box like a convex solver, and implement some methods by hand. It will be easier to maintain, debug, and integrate with other packages in the ecosystem.

xukai92 commented 5 years ago

The loading of DensityRatioEstimation.jl? It shouldn't be affected as far as I know, perhaps it will be faster even becasue JuMP and Ipopt won't be loaded by default?

Yes. I've seen discussions on this some time ago on the use of Requires.jl and speed of package loading. But I agree we don't need to worry about this for now.

The idea behind Requires.jl is that users control which parts of the package are loaded. If the user starts a session with just DensityRatioEstimation loaded, it has fast load and access to analytical or implemented-by-hand methods. If it later on loads JuMP and a solver, it gets access to other methods that are loaded on the fly.

Yes I understand how Requires.jl works. I also used it for several projects.

My point is that we should avoid as much as possible a heavy black box like a convex solver, and implement some methods by hand. It will be easier to maintain, debug, and integrate with other packages in the ecosystem.

I completely agree.


I will create a PR to make solvers optionally dependent.

juliohm commented 5 years ago

I will create a PR to make solvers optionally dependent.

That is great.

JuMP.jl and solvers could be imported on the files implementing the methods. For example, the KMM method could be implemented in a file kmm.jl and JuMP and Ipopt would be loaded there with Requires.jl syntax.

I will try to get back to you over the week or next week after doing some review of the theory. Implementing some of these solvers by hand is an interesting and challenging exercise.

juliohm commented 5 years ago

An update: I have two working implementations of KLIEP, one depending on Optim.jl and another depending on Convex.jl. I will try to find time to restructure the package to handle these optional dependencies with Requires.jl. We could have many methods implemented with different optimisation stacks.

xukai92 commented 5 years ago

JuMP.jl and solvers could be imported on the files implementing the methods. For example, the KMM method could be implemented in a file kmm.jl and JuMP and Ipopt would be loaded there with Requires.jl syntax.

KMM has both dependency free solution and solution based on JuMP.jl. Therefore, I alwasy include kmm.jl itself which contains the dependency free solution. I guess maybe it was what's confusing in our discussion in the PR?

An update: I have two working implementations of KLIEP, one depending on Optim.jl and another depending on Convex.jl. I will try to find time to restructure the package to handle these optional dependencies with Requires.jl. We could have many methods implemented with different optimisation stacks.

Great!

juliohm commented 4 years ago

I think this issue is overall solved. It is great that we can have different implementations depending on different optimization packages 💯. I will open another issue to formalise the API a little further.