TuringLang / AdvancedHMC.jl

Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms
https://turinglang.org/AdvancedHMC.jl/
MIT License
228 stars 39 forks source link

Use LogDensityProblems.jl #301

Closed torfjelde closed 1 year ago

torfjelde commented 1 year ago

Turing.jl has somewhat "recently" started using LogDensityProblems.jl under the hood to simply all the handling of AD. This PR does the same for AdvancedHMC.jl, which in turn means that we can just plug a Turing.LogDensityFunction into AHMC and sample, in theory making the HMC implementation, etc. in Turing.jl redundant + we get additional AD-backends for free, e.g. Enzyme.

IMO we might even want to consider making a bridge-package between AbstractMCMC.jl and LogDensityProblems.jl simply containing this LogDensityModel.jl, given how this implementation would be very useful in other packages implementing samplers, e.g. AdvancedMH.jl. Thoughts on this @devmotion @yebai @xukai92 ?

A couple of (fixable) caveats:

EDIT: This now depends on https://github.com/TuringLang/AbstractMCMC.jl/pull/110.

tpapp commented 1 year ago

@torfjelde: similar should work fine IMO. PRs to LogDensityProblems are most welcome and I will review them quickly.

yebai commented 1 year ago

I fully support this idea. Also happy to consider making AbstractMCMC / AbstractPPL dependent on LogDensityProblems.jl.

yebai commented 1 year ago

@tpapp would you keep LogDensityProblems's dependency lightweight in the long term?

torfjelde commented 1 year ago

I fully support this idea. Also happy to consider making AbstractMCMC / AbstractPPL dependent on LogDensityProblems.jl.

The main argument against this is that LogDensityProblems.jl uses Requires.jl, and the idea behind AbstractMCMC is that it should be so lightweight that you should be able to just depend on it. This is why I'm in favour of a bridge-package (though I'm also not a big fan of the growing number of packages, but seems like it'll be difficult to find an alternative here).

yebai commented 1 year ago

The main argument against this is that LogDensityProblems.jl uses Requires.jl, and the idea behind AbstractMCMC is that it should be so lightweight that you should be able to just depend on it. This is why I'm in favour of a bridge-package (though I'm also not a big fan of the growing number of packages, but seems like it'll be difficult to find an alternative here).

It might be possible to move some core types and APIs from LogDensityProblems into AbstractMCMC, then make LogDensityProblems dependent on AbstractMCMC. But that requires @tpapp's support of course.

devmotion commented 1 year ago

LogDensityProblems is much more lightweight than AbstractMCMC: The former depends on 4 packages (https://juliahub.com/ui/Packages/LogDensityProblems/wTQV3/1.0.2?page=1) whereas the latter has 7 direct and 35 indirect dependencies (https://juliahub.com/ui/Packages/AbstractMCMC/5x8OI/4.1.3?page=1). So I don't think it makes sense to make LogDensityProblems dependent on AbstractMCMC, the other way around seems more reasonable.

AFAICT most of the dependencies are pulled in by BangBang and Transducers.

devmotion commented 1 year ago

Already in https://github.com/TuringLang/Turing.jl/pull/1877 my plan was to move Turing.LogDensityFunction and Turing-specific AD stuff to separate packages. I just haven't had time to push it but I think it should not be part of Turing in the long run. For instance, to avoid redefining the AD parts and density functions etc. in other packages such as Bijectors, AdvancedVI etc. (and AdvancedHMC I assume).

tpapp commented 1 year ago

@yebai: Yes, LogDensityProblems will be kept very light. I am not planning to add any more dependencies. It currently uses Requires.jl for loading backend-specific code for AD, I am hoping to replace it with AbstractDifferentiation.jl when that matures.

yebai commented 1 year ago

@tpapp many thanks for confirming. Yes, I am slightly worried about depending on Requires.

devmotion commented 1 year ago

I also think that AbstractDifferentation could resolve this but to me it seems still not mature enough. From the ouside (maybe someone knows it better) it seems that unfortunately nobody is pushing it right now and nobody is working on fixing the different issues it has. Possibly one reason for that state could be that it might require some design changes/decisions, and at least to me it's a bit unclear who's deciding what to do in the end.

torfjelde commented 1 year ago

I honestly think at this point I'd be happy to add the implementation of LogDensityModel from this PR to AbstractMCMC .jland depend on LogDensityProblems.jl. We've had plans for ages of adding some "simple"/default models, etc. to AbstractMCMC.jl which could be used by many of the sampler packages without them having to define their own models, but these plans/discussions usually grind to a halt because we're never quite satisfied with the design. I think LogDensityProblems.jl does a good enough job for it to be worth just going with that approach for now. Then packages such as AdvancedHMC.jl and AdvancedMH.jl can both just use the LogDensityProblems.jl interface to interact with the model, which in turn has some nice implications:

  1. Anything that implements LogDensityProblems.logdensity, LogDensityProblems.dimension, and LogDensityProblems.capabilties can be used, i.e. the user can use the same model across the different packages.
  2. We can move the non-AD stuff for Turing.LogDensityFunction to DynamicPPL.jl.
  3. We can easily apply samplers not from Turing.jl to models from DynamicPPL.jl without any further changes (if the samplers start making use of the LogDensityProblems-interface).

(3) was actually my original intention for this PR. Using AdvancedHMC.jl directly gives me much better control of the sampler than the implementation of HMC present in Turing.jl, and after this PR I can just make a Turing.LogDensityFunction from my model and then AdvancedHMC.jl just works.

EDIT: The drawback is of course that we're now introducing dep on Requires.jl to AbstractMCMC.jl :confused:

yebai commented 1 year ago

@torfjelde that sounds like a sensible plan. I am happy to make AbstractMCMC depend on LogDensityProblems.

For the slightly longer term, I hope that LogDensityProblems can separate the AD stuff into a separate package so LogDensityProblems is truly lightweight.

tpapp commented 1 year ago

@yebai: Yes, that is the plan. I created an issue for further discussion/suggestions about moving AD backends out of the package: https://github.com/tpapp/LogDensityProblems.jl/issues/97

yebai commented 1 year ago

@tpapp my impression is that AbstractDifferentiation will likely take a very long time to mature if that happens at all.

Would you be happy if @torfjelde helped to separate the AD code in LogDensityProblems into a new package now-ish? This would allow AbstractMCMC to depend on LogDensityProblems without introducing a dependency on Requires which we tried hard to avoid.

devmotion commented 1 year ago

AdvancedHMC, Turing, etc. all depend on Requires anyway, don't they? But maybe it would be a good idea anyway to move them into separate (sub-)packages.

I think one other point is that it would be nice to be able to define AD backend types as well (maybe could be done even without hiding it behind Requires), similar to the backend types in AbstractDifferentiation and Turing: That would allow to pass the AD type and its option around in a somewhat clean and standardized way (cf. https://github.com/TuringLang/AdvancedHMC.jl/pull/301#discussion_r1029169062).

tpapp commented 1 year ago

I separated the AD backend code to https://github.com/tpapp/LogDensityProblemsAD.jl. Once it registers, I will deprecate LogDensityProblems.ADgradient and remove code from there and the latter package will no longer depend on Requires.jl.

Further suggestions for AD backends are most welcome, but please open an issue there so they don't get lost.

tpapp commented 1 year ago

I have now factored out all the AD code from LogDensityProblems. A new release (2.0) will happen as soon as the registry automerges (thanks for your patience, this has been a busy time for me so I only got around to it now).

I plan to use weak dependencies instead of Requires.jl in LogDensityProblemsAD. Follow https://github.com/tpapp/LogDensityProblemsAD.jl/issues/2 if you want to keep an eye on this.

torfjelde commented 1 year ago

This is causing issues: https://github.com/tpapp/LogDensityProblemsAD.jl/blob/79e245e1bb8e904087af9de284e2796e51eb55c4/src/AD_ForwardDiff.jl#L21-L23

LogDensityProblemsAD.ADgradient requires ForwardDiff.GradientConfig and so the type of the dual vector will be tied to the density problem.

EDIT: Just for the record, this also breaks CUDA compat, etc.; it's not limited to ComponentArray.

devmotion commented 1 year ago

Where/why is that problematic? There are no issues with that design in Turing (in fact, being able to specify the GradientConfig is very useful for working with custom tags: https://github.com/TuringLang/Turing.jl/blob/61b06f642872522ca14133bb5c86fc603841dbba/src/essential/ad.jl#L89-L100).

torfjelde commented 1 year ago

It's not necessarily problematic, but it does mean this change will be breaking. Currently AHMC won't require specification of the gradient config to work with something like ComponentArray.

torfjelde commented 1 year ago

This PR should now be good to go as soon as tests pass:)