StanJulia / StanSample.jl

WIP: Wrapper package for the sample method in Stan's cmdstan executable.
MIT License
18 stars 4 forks source link

Inconsistent dimensions for n-dimensional arrays. #54

Closed AndyPohlNZ closed 2 years ago

AndyPohlNZ commented 2 years ago

Hi Rob,

Thanks for putting work into getting stan and julia playing nice together and for the fix to the above issue. Sorry for raising this but the fix implemented does not seem to generalize to multidimensional arrays. I am using the master branch of StanSample: StanSample v6.9.2 https://github.com/StanJulia/StanSample.jl#master. See the modified example below:

using StanSample
ProjDir = @__DIR__
n1 = 50
n2 = 2
n3 = 27
x = fill(0, n1, n2, n3)

stan_data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3)
println(size(stan_data["x"]))

mdl = "
data { 
    int n1;
    int<lower=1> n2;
    int<lower=1> n3;
    array[n1, n2, n3]real x;            
}

parameters {
   real mu;
} 

model {
    mu ~ normal(0, 1);
}
"

tmpdir = joinpath(ProjDir, "tmp")
isdir(tmpdir) && rm(tmpdir; recursive=true)
stan_model = SampleModel("temp_model", mdl, tmpdir)
stan_sample(
    stan_model;
    data=stan_data,
    seed=123,
    num_chains=4,
    num_samples=1000,
    num_warmups=1000,
    save_warmup=false
)

Returns the following error: Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x; position=0; dims declared=(50,2,27); dims found=(27,2,50)

Originally posted by @AndyPohlNZ in https://github.com/StanJulia/StanSample.jl/issues/51#issuecomment-1197394509

goedman commented 2 years ago

Hi Andy, Thanks for raising this issue. I see it as well. Back to the drawing board...

goedman commented 2 years ago

Hi Andy ( @AndyPohlNZ ), the issue was caused by the update_json_files() function in StanBase.jl. I just pushed StanBase.jl v4.7.3 which I think should fix the problem.

I have not seen a model to check the results of the fix. Also, I'm a bit concerned this is another point solution, not something that deals with Array{T, N} in general.

AndyPohlNZ commented 2 years ago

Hi Rob,
Just updated to StanBase.jl v4.7.3 which does seem to work for the example above although as you suspected the fix doesn't hold in general....

See this slightly more complex example which I hope is a useful test case for testing dimensions of both input and output:

ProjDir = @__DIR__
using StanSample
using DataFrames
using Random, Distributions

# Dimensions
n1 = 1;
n2 = 2;
n3 = 3;
# Number of observations
N = 500;

# True values
σ = 0.01;
μ₁ = [1.0, 2.0, 3.0];
μ₂ = [10, 20, 30];

μ = Array{Float32}(undef, n1, n2, n3);
μ[1, 1, :] = μ₁;
μ[1, 2, :] = μ₂;

# Observations
y = Array{Float32}(undef, N, n1, n2, n3);
for i in 1:N
    for j in 1:n1
        for k in 1:n2
            for l in 1:n3
                y[i, j, k, l] = rand(Normal(μ[j, k, l], σ))
            end
        end
    end
end

mdl = "
data {
    int<lower=1> N;
    int<lower=1> n1;
    int<lower=1> n2;
    int<lower=1> n3;
    array[N, n1, n2] vector[n3] y;
}

parameters {
    array[n1, n2] vector[n3] mu;
    real<lower=0> sigma;
}
model {
    // Priors
    sigma ~ inv_gamma(0.01, 0.01);
    for (i in 1:n1) {
        for (j in 1:n2) {
            mu[i, j] ~ normal(rep_vector(0, n3), 1e6);
        }
    }

    // Model
    for (i in 1:N){
        for(j in 1:n1){
            for(k in 1:n2){
                y[i, j, k] ~ normal(mu[j, k], sigma);
            }
        }
    }
}
"

stan_data = Dict(
    "y" => y,
    "N" => N,
    "n1" => n1,
    "n2" => n2,
    "n3" => n3,
);

println(size(stan_data["y"]))

tmpdir = joinpath(ProjDir, "tmp")
isdir(tmpdir) && rm(tmpdir; recursive=true)
stan_model = SampleModel("multidimensional_inference", mdl, tmpdir)
stan_sample(
    stan_model;
    data=stan_data,
    seed=123,
    num_chains=4,
    num_samples=1000,
    num_warmups=1000,
    save_warmup=false
)

samps = DataFrame(read_samples(stan_model))
println(describe(samps))

Error suggests dimensions are still inconsistent: Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=y; position=0; dims declared=(500,1,2,3); dims found=(3,2,1,500)

goedman commented 2 years ago

Hi Andy, just in the process of pushing StanBase.jl v4.7.4. This afternoon, driving up to the mountains here in Colorado, I realized a better approach is to simply permutedims() the input matrix if we go from Julia to Stan. The new version checks if the input is a subtype of Array and then I do permute dims(input, length(size(input)):-1:1). Basically reverse the axis.

I wrote a simple test program that I think demonstrates it works for 2, 3 and 4 (and up) dimensions:

using StanSample, Statistics, Test

ProjDir = @__DIR__
n1 = 2
n2 = 3
n3 = 4
n4 = 4

stan0_2 = "
data { 
    int n1;
    int<lower=1> n2;
    array[n1, n2] real x;            
}

generated quantities {
    array[n1] real mu;
    for (i in 1:n1)
        mu[i] =  x[i, 1] + x[i, 2] +x[i, 3];
}
";

x = Array(reshape(1:n1*n2, n1, n2))
data = Dict("x" => x, "n1" => n1, "n2" => n2)

tmpdir = joinpath(ProjDir, "tmp")
m0_2s = SampleModel("m0_2s", stan0_2, tmpdir)
rc0_2s = stan_sample(m0_2s; data)

if success(rc0_2s)
    post0_2s = read_samples(m0_2s, :dataframe)
    sums_stan_2 = Int.(mean(Array(post0_2s); dims=1))[1, :]
    sums_julia_2 = [sum(x[i, :]) for i in 1:n1]
    @test sums_stan_2 == sums_julia_2
end

stan0_3 = "
data { 
    int n1;
    int<lower=1> n2;
    int<lower=1> n3;
    array[n1, n2, n3] real x;            
}

generated quantities {
    array[n1, n2] real mu;
    for (i in 1:n1)
        for (j in 1:n2)
            mu[i, j] =  x[i, j, 1] + x[i, j, 2] +x[i, j, 3] + x[i, j, 4];
}
";

x = Array(reshape(1:n1*n2*n3, n1, n2, n3))
data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3)

tmpdir = joinpath(ProjDir, "tmp")
m0_3s = SampleModel("m0_3s", stan0_3, tmpdir)
rc0_3s = stan_sample(m0_3s; data)

if success(rc0_3s)
    post0_3s = read_samples(m0_3s, :dataframe)
    sums_stan_3 = Int.(mean(Array(post0_1s); dims=1))[1, :]
    sums_julia_3 = [sum(x[i, j, :]) for j in 1:n2 for i in 1:n1]
    @test sums_stan_3 == sums_julia_3
end

stan0_4 = "
data { 
    int n1;
    int<lower=1> n2;
    int<lower=1> n3;
    int<lower=1> n4;
    array[n1, n2, n3, n4] real x;            
}

generated quantities {
    array[n1, n2, n3] real mu;
    for (i in 1:n1)
        for (j in 1:n2)
            for (k in 1:n3)
                mu[i, j, k] =  x[i,j,k,1] + x[i,j,k,2] + x[i,j,k,3] + x[i,j,k,4];
}
";

x = Array(reshape(1:n1*n2*n3*n4, n1, n2, n3, n4))
data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3, "n4" => n4)

m0_4s = SampleModel("m0_4s", stan0_4, tmpdir)
rc0_4s = stan_sample(m0_4s; data)

if success(rc0_4s)
    post0_4s = read_samples(m0_4s, :dataframe)
    sums_stan_4 = Int.(mean(Array(post0_4s); dims=1))[1, :]
    sums_julia_4 = [sum(x[i, j, k, :]) for k in 1:n3 for j in 1:n2 for i in 1:n1]
    @test sums_stan_4 == sums_julia_4
end

which I'm adding to the StanSample test set.

I'll try your test script above but I think to Stan array[N, n1, n2] vector[n3] y; is identical to array[N, n1, n2, n3] y;

goedman commented 2 years ago

I'm getting:

7×7 DataFrame
 Row │ variable  mean        min          median      max         nmissing  eltype   
     │ Symbol    Float64     Float64      Float64     Float64     Int64     DataType 
─────┼───────────────────────────────────────────────────────────────────────────────
   1 │ mu.1.1.1   1.00011     0.998632     1.00012     1.00163           0  Float64
   2 │ mu.1.2.1   9.99936     9.99774      9.99937    10.0009            0  Float64
   3 │ mu.1.1.2   1.9995      1.9977       1.99949     2.00175           0  Float64
   4 │ mu.1.2.2  19.9993     19.9975      19.9993     20.0008            0  Float64
   5 │ mu.1.1.3   3.00083     2.99928      3.00084     3.00253           0  Float64
   6 │ mu.1.2.3  29.9998     29.9984      29.9998     30.0016            0  Float64
   7 │ sigma      0.0100989   0.00968489   0.0100999   0.0105766         0  Float64

which I think is correct.

AndyPohlNZ commented 2 years ago

I can confirm that things seem to be working correctly with StanBase v.4.7.4. Thanks so much for your quick turnaround with this issue! Enjoy the mountains :).

goedman commented 2 years ago

Hi Andy, is it ok if I add your script as a test case? It's a great example! Thanks for you help, it has bothered me a long time that the JSON stuff was never put to the test.

AndyPohlNZ commented 2 years ago

Hi Rob. Absolutely feel free to add the example as a test case.