TuringLang / Bijectors.jl

Implementation of normalising flows and constrained random variable transformations
https://turinglang.org/Bijectors.jl/
MIT License
199 stars 32 forks source link

Implementation of `VecCorrBijector` #246

Closed torfjelde closed 1 year ago

torfjelde commented 1 year ago

This PR implements a VecCorrBijector, i.e. CorrBijector but where the unconstrained space is treated as a Vector of the correct size rather than working with a Matrix which is actually embedded in a subspace of matrices.

This is particularly when doing stuff like MCMC sampling, where the full matrix representation might give the sampler the false indication that the dimension is d × d instead of (d choose 2) (as is the case here).

harisorgn commented 1 year ago

I rebased main here to resolve conflicts around reshape and its tests. Hope it wasn't overkill in terms of tracking changes.

harisorgn commented 1 year ago

Test failure is the ForwardDiff issue I mentioned #253 in this comment.

devmotion commented 1 year ago

Just saw the comment: The ForwardDiff issue is definitely not https://github.com/JuliaDiff/ForwardDiff.jl/issues/606 since the release with https://github.com/JuliaDiff/ForwardDiff.jl/issues/481 was yanked and these changes only exist on the master branch of ForwardDiff (unreleased).

I haven't checked the issue in detail yet so maybe it's not relevant here but my general advice is to not call cholesky (or similar linalg functions) with generic matrices when you know it has a special structure (symmetric, hermitian, etc.) but always wrap the matrix in a Symmetric, Hermitian, etc. wrapper first. This will dispatch to optimized methods, avoid checks such as ishermitian, and also yield different paths and results in AD.

harisorgn commented 1 year ago

but always wrap the matrix in a Symmetric, Hermitian, etc. wrapper first. This will dispatch to optimized methods, avoid checks such as ishermitian, and also yield different paths and results in AD.

Thanks David! This seems to work for our issue. The interface tests for v1.6 now fail on the Hermitian constructor during ReverseDiff. Seems that an rrule for the constructor is accounted for in latest but not in v1.6 (though I couldn't find it explicitly)?

devmotion commented 1 year ago

My initial guess would be that this is caused by some changes in LinearAlgebra rather than AD (ReverseDiff didn't change Julia compat and doesn't use rrules in general).

torfjelde commented 1 year ago

I messed up and merged the formatting PR into master before this :facepalm: So the result was quite a lot of conflicts. I've resolved them now though :+1:

devmotion commented 1 year ago

Regarding cholesky/Hermitian: I assume it works on Julia >= 1.8 due to https://github.com/JuliaLang/julia/pull/44076.

harisorgn commented 1 year ago

Regarding cholesky/Hermitian: I assume it works on Julia >= 1.8 due to https://github.com/JuliaLang/julia/pull/44076.

Ah yes, thanks once again! Sorry, misread the stack trace on my phone and thought the issue was the Hermitian constructor 😅

How should we handle this then? The Hermitian wrapper before cholesky solves the ForwardDiff issue with the tiny numerical mismatch, but creates an instance of <: Hermitian{....TrackedArray....} that cholesky can't handle in v1.6 .

torfjelde commented 1 year ago

One straight-forward (but hacky) solution is of course:

cholesky_factor(X::ReverseDiff.TrackedArray) = cholesky_factor(cholesky(X))

Also, though I don't see another fix, I'm uncertain if Hermitian is the 100% correct solution since during sampling we can easily run into matrices which end up not being hermitian due to numerical issues. If this happens, it seems to me that we'd want to error rather than just continue, no?

harisorgn commented 1 year ago

One straight-forward (but hacky) solution is of course

Was just checking in case I wasn't seeing something less hacky, but let's go with this now 👍 . I've added this method in src/compat/reversediff.jl to avoid a ReverseDiff dependency outside that submodule, a bit annoying though as it is not grouped with the rest of the methods.

we can easily run into matrices which end up not being hermitian due to numerical issues. If this happens, it seems to me that we'd want to error rather than just continue, no?

Should we have something like an ishermitian check but with tolerances? The ForwardDiff issue was numerical but tiny as far as I tested it.

harisorgn commented 1 year ago

Do we want to keep CorrBijector in general? Will it have a purpose after this PR?

torfjelde commented 1 year ago

Do we want to keep CorrBijector in general? Will it have a purpose after this PR?

Good question. I'd say keep it for now + make an issue.

yebai commented 1 year ago

Great work -- thanks @harisorgn, @torfjelde and @devmotion!