brian-j-smith / Mamba.jl

Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia
Other
253 stars 52 forks source link

question: can array of arrays be used in stochastic node (like a list of vectors in #rstats) #151

Closed statsccpr closed 5 years ago

statsccpr commented 5 years ago

Is it currently (or in the future) possible to use an array of arrays in a stochastic node?

say, an outer array of 2 inner arrays, where the inner arrays can vary in length (3 and 5 in this case)

[ [1 2 3] , [1 2 3 4 5] ]

the example above would represent times of event A and times of event B

a single 'array of arrays' is one 'data point' to be collectively used in one log pdf evaluation my_user_defined_logpdf(one_array_of_arrays)

using rstats terminology for the data structures, i'm hoping to supply a list of unequal length numeric vectors into the likelihood

i've seen past threads say you can use an array of multivariate distributions in the stochastic node

https://github.com/brian-j-smith/Mamba.jl/issues/44

  beta = Stochastic(2,
    @modelexpr(mu_beta, Sigma, N,
      MultivariateDistribution[
        MvNormal(mu_beta, Sigma)
        for i in 1:N
      ]
    ),
    false

where it seems the above snippet is somewhat related to what i need for the array of arrays

Some pseudo code is below highlighting the important portions. the custom log likelihood i plan to use has the scope

my_user_defined_logpdf_internal_calculations(mu=d.mu, alpha=d.alpha, alpha=d.beta, x=array_of_array)

where the three sets of parameters are

mu::Vector{Float64}
alpha::Matrix{Float64}
beta::Vector{Float64}

and the array of arrays is used as the x argument (the data) x=array_of_arrays

If the above functionality is supported, i guess an open question is what would be the appropriate typing for the data argument x=array_of_arrays

function insupport{T<:Real}(d::NewMultivarDist, x::AbstractVector{T})

@everywhere extensions = quote

## Load needed packages and import methods to be extended
using Distributions
import Distributions: length, insupport, _logpdf

## Type declaration
type NewMultivarDist <: ContinuousMultivariateDistribution

 # https://docs.julialang.org/en/v1/base/arrays/
mu::Vector{Float64}
alpha::Matrix{Float64}
beta::Vector{Float64}

end

## The following method functions must be implemented

## Dimension of the distribution
length(d::NewMultivarDist) = length(d.mu)

## Logical indicating whether x is in the support
function insupport{T<:Real}(d::NewMultivarDist, x::AbstractVector{T})
length(d) == length(x) && all(isfinite.(x))
end

## Normalized or unnormalized log-density value
function _logpdf{T<:Real}(d::NewMultivarDist, x::AbstractVector{T})
my_user_defined_logpdf_internal_calculations(mu=d.mu, alpha=d.alpha, alpha=d.beta, x=array_of_arrays)

end
end
statsccpr commented 5 years ago

this is probably feature bloat or too hard to implement.

the specific example i had in mind could probably be handled by combining the two unequal length inner arrays by padding the spaces with NaN, then post processing it after