TuringLang / AbstractMCMC.jl

Abstract types and interfaces for Markov chain Monte Carlo methods
https://turinglang.org/AbstractMCMC.jl
MIT License
78 stars 18 forks source link

Remove overly specialized bundle_samples #120

Closed torfjelde closed 1 year ago

torfjelde commented 1 year ago

Related: #118

torfjelde commented 1 year ago

Do you have a concrete example that is fixed by this removal? Removing this definitions seems very breaking. Without this method the return types will be incorrect, won't they? The main point is that the elements are converted to type T which won't happen anymore.

Nothing is fixed, it's just that

  1. I don't actually know of anywhere this is being used. Do we know of any place where this is the case?
  2. If you want to do something with Type{Vector{...}} then you need to override this default implementation. This sort of thing comes up very often once you have samplers which are working on a "multiple" models, e.g. if you implement tempering. In the case it seems very reasonable to allow Vector{MCMCChains.Chains} for samples::AbstractVector, which then requires explicit overriding of this default implementation.
  3. map + convert is very easy to do for the user themselves, to the point where it doesn't seem worth the hassle it causes :shrug:
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: -0.03 :warning:

Comparison is base (7192263) 97.33% compared to head (216488f) 97.30%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #120 +/- ## ========================================== - Coverage 97.33% 97.30% -0.03% ========================================== Files 8 8 Lines 300 297 -3 ========================================== - Hits 292 289 -3 Misses 8 8 ``` | [Impacted Files](https://codecov.io/gh/TuringLang/AbstractMCMC.jl/pull/120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang) | Coverage Δ | | |---|---|---| | [src/interface.jl](https://codecov.io/gh/TuringLang/AbstractMCMC.jl/pull/120?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang#diff-c3JjL2ludGVyZmFjZS5qbA==) | `94.11% <ø> (-0.89%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=TuringLang)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

torfjelde commented 1 year ago

Without this method the return types will be incorrect, won't they? The main point is that the elements are converted to type T which won't happen anymore.

And this is not even a promise we are making, right? For example, bundle_samples will currently just be identity if you provide it with a type it has no implementation for (this is a different question: should we error here?)

torfjelde commented 1 year ago

An alternative is of course to introudce a

AbstractMCMC.bundle_samples_to_vector

or something, i.e. just an additional diversion so that overloading for samples::AbstractVector doesn't become such a hassle, which is then the default (and thus keeps the change non-breaking).

And let me try to clarify a bit the issue I have with the current approach.

If you want to overload AbstractMCMC.bundle_samples(::AbstractVector, ..., ::Type{T}) where T, and, say, just make some transformation to the samples before passing them on to the "actual" bundle_samples implementation (this is very common in the case of "meta-samplers", e.g. tempered samplers, composition of samplers), then you have to do the following

function AbstractMCMC.bundle_samples(samples, ..., ::Type{T}) where {T}
    return actual_bundle_samples(samples, ..., T)
end

function AbstractMCMC.bundle_samples(samples::Vector, ..., ::Type{Vector{T}}) where {T}
    return actual_bundle_samples(samples, ..., Vector{T})
end

which gets quite annoying.

devmotion commented 1 year ago

I've been annoyed by downstream issues/annoyances of the bundle_samples interface as well. The reason why I'm a bit skeptic about the PR in its current form is that it is breaking (clearly indicated by the broken tests) and hence should be released in a breaking release, regardless of how useful the change is. But when we go through the hassle of updating all downstream packages anyway, I wonder if we should spend a few days on thinking about whether we could/should improve the interface more generally. For instance, currently it's not intended by the interface to dispatch on the output-type in the sample call - is that an issue? Is the output type useful at all? Should we remove bundle_samples completely, what would the implications be? Could/should we generally reduce the number of arguments in the interface API?

torfjelde commented 1 year ago

Oooh AbstractMCMC is already on x.y.z versioning! I was thinking bumping minor version was a breaking release, and so was confused why you were talking about how this should be a breaking release thinking I'd already indicated so.

But when we go through the hassle of updating all downstream packages anyway, I wonder if we should spend a few days on thinking about whether we could/should improve the interface more generally. For instance, currently it's not intended by the interface to dispatch on the output-type in the sample call - is that an issue? Is the output type useful at all? Should we remove bundle_samples completely, what would the implications be? Could/should we generally reduce the number of arguments in the interface API?

And yeah, happy to spend a few days thinking aobut whether or not this is the way to go :+1:

Should we maybe make an issue or discussion?

torfjelde commented 1 year ago

currently it's not intended by the interface to dispatch on the output-type in the sample call - is that an issue?

Is the output type useful at all?

When you refer to the "output type in the sample call", are you talking about the samples argument of bundle_samples or the output-type specified to bundle_samples? I'm guessing the latter, but just asking to be sure.

And btw, a lot of these PRs are motivated by "annoyances" I've encountered when working on MCMCTempering.jl, where we have several "meta-samplers", i.e. samplers which call other sampler in some way.

In MCMCTempering.jl we have a MultiModel, which effectively defines a product of models but with an ordering (for implementation purposes only). In this case, it might be sensible for the user to ask for a Vector{MCMCChains.Chains}, with each chain corresponding to each model.

Another particularly "weird" case where bundle_samples is used (though not related to the dispatching on the output-type), though IMO it also seems like a decent way to handle this stuff this method: https://github.com/TuringLang/MCMCTempering.jl/blob/ed1ca9886d1c49aece23aacf2bc9fcaf77725fd5/src/MCMCTempering.jl#L93-L108

In this case, the samples correspond to samples from a product of models, and so it represents an ordered sequence of samples rather than a single one. But the sequence is ordered according to an "index process", which is not necessarily equal to the model-ordering (which is what you'd expect as a user). To "handle" this in a somewhat more convenient manner for the user, we've added a kwarg to bundle_samples called bundle_resolve_swaps which will automatically re-order the sequence in each sample to align with the provided models.

Should we remove bundle_samples completely, what would the implications be?

Do you mean completely removing it and just not doing anything with the samples, leaving that to the sampler-implementer? If so, I'm not a big fan, in particular not now that we have extensions and people can easily add impls for MCMCChains, etc.

Could/should we generally reduce the number of arguments in the interface API?

What we could always do, though it has to be done with care, is to add some simpler bundle_samples method which can be used if all the sampler-information isn't useful, i.e.

function bundle_samples(
    samples, ::AbstractModel, ::AbstractSampler, ::Any, output_type::Type; kwargs...
)
    return bundle_samples(samples, output_type; kwargs...)
end

and then just make the default implementation we complained about above

function bundle_samples(samples::Vector, ::Type{Vector{T}}; kwargs...) where {T}
    return map(samples) do sample
        convert(T, sample)
    end
end

I believe in most (but not all) scenarions, one would really only overload the latter implementation.

torfjelde commented 1 year ago

@devmotion Could we revisit this?:)

devmotion commented 1 year ago

I'm still unhappy. The least this should be moved to a breaking 5.0.0 release.

However, I wonder - could we just move the functionality to the body of the generic fallback, wrapped in a if samples isa Vector && T <: Vector (or similar) check to fix the method ambiguity issues without actually removing functionality?

devmotion commented 1 year ago

Since #123 was merged, I suggest closing this PR for now and revisit it (or some variant of it) if these ambiguity issues show up again.

torfjelde commented 1 year ago

Sounds good :+1: