TuringLang / DynamicPPL.jl

Implementation of domain-specific language (DSL) for dynamic probabilistic programming
https://turinglang.org/DynamicPPL.jl/
MIT License
164 stars 29 forks source link

Possibly confusing `.~` meaning #435

Open ParadaCarleton opened 3 years ago

ParadaCarleton commented 3 years ago

From the previous issue on calculations for pointwise_loglikelihood:

What do you expect pointwise_loglikelihoods to compute here? The marginal likelihood for each dimension? If so, I don't think this is what you want for something like LOO-PIS.

EDIT: If y is a Matrix, it makes a bit more sense, and in that case you can use either a for loop or .~ (for MultvariateDistribution, .~ assumes each column represents an observation).

I'm unsure if this is the most intuitive behavior. The problem here is that, in general, the rows of a table represent independent observations, while each column is a different attribute. My suggestion would be to use the following set of rules for .~ broadcasting:

  1. If the object being broadcasted over is a table from Tables.jl and is not an Array, Turing should recognize that each row is an independent observation for the purposes of pointwise_loglikelihood and any similar functions. (Although internally, it should be treated as a broadcast over columns -- this should improve performance, since columns usually have only one type.)
  2. If the object is an array, not from Tables.jl, and not square, broadcasting over columns is safe -- .~ will throw a dimension mismatch if the user meant to broadcast over rows. We can assume the user meant to broadcast over columns because Julia is column major without risking too much.
  3. If the object being broadcasted over is square and not a Table, or if it is both a Table and an Array, an error should be thrown.

This is annoying given Julia is column-major, but I think it's the best solution to avoid errors in data analysis.

torfjelde commented 3 years ago

I'm unsure if this is the most intuitive behavior. The problem here is that, in general, the rows of a table represent independent observations, while each column is a different attribute.

Not in Julia though :confused: Since it's column-major and thus extracting each observation using an access pattern of x[:, i] is faster than x[i, :]. Therefore one should always use the representation of each column representing an observation.

My suggestion would be to use the following set of rules for .~ broadcasting:

  1. If the object being broadcasted over is a table from Tables.jl and is not an Array, Turing should recognize that each row is an independent observation for the purposes of pointwise_loglikelihood and any similar functions. (Although internally, it should be treated as a broadcast over columns -- this should improve performance, since columns usually have only one type.)

  2. If the object is an array, not from Tables.jl, and not square, broadcasting over columns is safe -- .~ will throw a dimension mismatch if the user meant to broadcast over rows. We can assume the user meant to broadcast over columns because Julia is column major without risking too much.

  3. If the object being broadcasted over is square and not a Table, or if it is both a Table and an Array, an error should be thrown.

I do agree that things are a bit confusing and ambigious, and appreciate your input on this:) But IMO this is a broader issue and not something that we should begin addressing in .~. Eventually the addressing of the issue should reach .~ though.

It essentially is just a question of how to represent "batches" of inputs, i.e. a collection of "independent" inputs in the sense that we want the computation to be performed on each of the elements in the collection independently but we provide it as a Batch to allow performance improvements.

This is something we've given a lot of thought already (https://github.com/TuringLang/Bijectors.jl/discussions/178) and something I've started playing around with in a Draft PR over in Bijectors.jl. We essentially want to extend the ColVecs and RowVecs from KernelFunctions.jl and provide it as a separate lightweight package. Once that's done we could start being way more explicit about all these things, e.g. we could easily add an iid method that you suggested which really just wraps the Distribution in way to tell it what sort of Batch to expect.

And I don't think we want to in any specialize on Tables.jl in Turing.jl (this is just my personal opinion though!). This too should just be wrapped in something similar to ColVecs or RowVecs to specify what is what.

ParadaCarleton commented 3 years ago

And I don't think we want to in any specialize on Tables.jl in Turing.jl (this is just my personal opinion though!). This too should just be wrapped in something similar to ColVecs or RowVecs to specify what is what.

Is there a reason why? I think this ability to specialize on a standardized API and have all my packages "just work" together the way I expect them to is probably my favorite thing about Julia.

Not in Julia though :confused: Since it's column-major and thus extracting each observation using an access pattern of x[:, i] is faster than x[i, :]. Therefore one should always use the representation of each column representing an observation.

I'm not sure about this, either from the perspective of intuitiveness/compatibility with Julia Base or speed:

  1. In Base, broadcasts over arrays broadcast over each element, rather than slices of that array:
    julia> sort!.(x)
    ERROR: MethodError: no method matching sort!(::Float64)
  2. With regards to speed, the problem is that if users have to transpose their dataframe to properly treat each column as a single IID observation, a lot of them will do so, which will often create columns of mixed type that can't be specialized on. (Each dimension should have one type, but observations are often of mixed type). I'd expect this to matter more than locality of reference, no?
torfjelde commented 3 years ago

Is there a reason why? I think this ability to specialize on a standardized API and have all my packages "just work" together the way I expect them to is probably my favorite thing about Julia.

Yeah, I'm 100% with this! What I'm trying to say is that we should strive to make things work with more general inputs, e.g. iterators rather than demanding it to be an array, etc. rather than explicitly implement support for Tables.jl:) But if a special impl would still be required, we could add some glue code in Turing.jl (the rest of the changes needs to happen in DynamicPPL.jl) :+1:

In Base, broadcasts over arrays broadcast over each element, rather than slices of that array

I agree this particular aspect is counter-intuitive (I was suprised to find that we handled this case tbh when I first saw it), but it's the sort of feature that no one will accidentally use since it's only applicable if you do x .~ ::MultivariateDistribution which doesn't make sense otherwise. Either you're aware of it and then use it, or you're not aware of it and you will never know of it's existence. Though I agree, there should be a better approach, hence the Batch idea I was talking about above. This would make the handling completely unambiguouos and so no such suprises would occur :+1:

Btw, the reason why we've done the above is to replicate the way Distributions.jl handles collections of samples from MultivariateDistribution.

With regards to speed, the problem is that if users have to transpose their dataframe to properly treat each column as a single IID observation, a lot of them will do so, which will often create columns of mixed type that can't be specialized on. (Each dimension should have one type, but observations are often of mixed type). I'd expect this to matter more than locality of reference, no?

The things is though, we don't support observations of mixed types since Distributions.jl doesn't :confused: And so such an example would have to be handled properly anyways.

The fact that we don't support Tables.jl for example, is not an issue with .~ but it's a more general and separate issue that I 100% agree needs to be addressed:

  1. We need to allow more types than just arrays on the LHS of ~, e.g. NamedTuple.
  2. Allow more general types than just Distribution or Array{<:Distribution} on the RHS of ~, e.g. an iterator of Distribution or something from MeasureTheory.jl, or just any struct that implements a rand and logpdf.

Once we've allow that, extensions to Tables.jl and others will be muuuuch easier:) You could even just use Tables.namedtupleiterator or whatever it's called to do this once we had support for (1).

I started working on these two points a bit yesterday btw (https://github.com/TuringLang/DynamicPPL.jl/pull/292).

torfjelde commented 3 years ago

@ParadaCarleton Something like this: https://github.com/TuringLang/AbstractPPL.jl/pull/26?

ParadaCarleton commented 3 years ago

@ParadaCarleton Something like this: TuringLang/AbstractPPL.jl#26?

I have no idea what that's doing but if you claim it helps I'll believe you. :sweat_smile:

torfjelde commented 3 years ago

I have no idea what that's doing but if you claim it helps I'll believe you. :sweat_smile:

Haha, I was referring to the example of sampling a DataFrame rather than the entirety of the PR :sweat_smile: