Closed AndyPohlNZ closed 2 years ago
Hi Andy, Thanks for raising this issue. I see it as well. Back to the drawing board...
Hi Andy ( @AndyPohlNZ ), the issue was caused by the update_json_files() function in StanBase.jl. I just pushed StanBase.jl v4.7.3 which I think should fix the problem.
I have not seen a model to check the results of the fix. Also, I'm a bit concerned this is another point solution, not something that deals with Array{T, N} in general.
Hi Rob,
Just updated to StanBase.jl v4.7.3 which does seem to work for the example above although as you suspected the fix doesn't hold in general....
See this slightly more complex example which I hope is a useful test case for testing dimensions of both input and output:
ProjDir = @__DIR__
using StanSample
using DataFrames
using Random, Distributions
# Dimensions
n1 = 1;
n2 = 2;
n3 = 3;
# Number of observations
N = 500;
# True values
σ = 0.01;
μ₁ = [1.0, 2.0, 3.0];
μ₂ = [10, 20, 30];
μ = Array{Float32}(undef, n1, n2, n3);
μ[1, 1, :] = μ₁;
μ[1, 2, :] = μ₂;
# Observations
y = Array{Float32}(undef, N, n1, n2, n3);
for i in 1:N
for j in 1:n1
for k in 1:n2
for l in 1:n3
y[i, j, k, l] = rand(Normal(μ[j, k, l], σ))
end
end
end
end
mdl = "
data {
int<lower=1> N;
int<lower=1> n1;
int<lower=1> n2;
int<lower=1> n3;
array[N, n1, n2] vector[n3] y;
}
parameters {
array[n1, n2] vector[n3] mu;
real<lower=0> sigma;
}
model {
// Priors
sigma ~ inv_gamma(0.01, 0.01);
for (i in 1:n1) {
for (j in 1:n2) {
mu[i, j] ~ normal(rep_vector(0, n3), 1e6);
}
}
// Model
for (i in 1:N){
for(j in 1:n1){
for(k in 1:n2){
y[i, j, k] ~ normal(mu[j, k], sigma);
}
}
}
}
"
stan_data = Dict(
"y" => y,
"N" => N,
"n1" => n1,
"n2" => n2,
"n3" => n3,
);
println(size(stan_data["y"]))
tmpdir = joinpath(ProjDir, "tmp")
isdir(tmpdir) && rm(tmpdir; recursive=true)
stan_model = SampleModel("multidimensional_inference", mdl, tmpdir)
stan_sample(
stan_model;
data=stan_data,
seed=123,
num_chains=4,
num_samples=1000,
num_warmups=1000,
save_warmup=false
)
samps = DataFrame(read_samples(stan_model))
println(describe(samps))
Error suggests dimensions are still inconsistent:
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=y; position=0; dims declared=(500,1,2,3); dims found=(3,2,1,500)
Hi Andy, just in the process of pushing StanBase.jl v4.7.4. This afternoon, driving up to the mountains here in Colorado, I realized a better approach is to simply permutedims() the input matrix if we go from Julia to Stan. The new version checks if the input
is a subtype of Array and then I do permute dims(input, length(size(input)):-1:1)
. Basically reverse the axis.
I wrote a simple test program that I think demonstrates it works for 2, 3 and 4 (and up) dimensions:
using StanSample, Statistics, Test
ProjDir = @__DIR__
n1 = 2
n2 = 3
n3 = 4
n4 = 4
stan0_2 = "
data {
int n1;
int<lower=1> n2;
array[n1, n2] real x;
}
generated quantities {
array[n1] real mu;
for (i in 1:n1)
mu[i] = x[i, 1] + x[i, 2] +x[i, 3];
}
";
x = Array(reshape(1:n1*n2, n1, n2))
data = Dict("x" => x, "n1" => n1, "n2" => n2)
tmpdir = joinpath(ProjDir, "tmp")
m0_2s = SampleModel("m0_2s", stan0_2, tmpdir)
rc0_2s = stan_sample(m0_2s; data)
if success(rc0_2s)
post0_2s = read_samples(m0_2s, :dataframe)
sums_stan_2 = Int.(mean(Array(post0_2s); dims=1))[1, :]
sums_julia_2 = [sum(x[i, :]) for i in 1:n1]
@test sums_stan_2 == sums_julia_2
end
stan0_3 = "
data {
int n1;
int<lower=1> n2;
int<lower=1> n3;
array[n1, n2, n3] real x;
}
generated quantities {
array[n1, n2] real mu;
for (i in 1:n1)
for (j in 1:n2)
mu[i, j] = x[i, j, 1] + x[i, j, 2] +x[i, j, 3] + x[i, j, 4];
}
";
x = Array(reshape(1:n1*n2*n3, n1, n2, n3))
data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3)
tmpdir = joinpath(ProjDir, "tmp")
m0_3s = SampleModel("m0_3s", stan0_3, tmpdir)
rc0_3s = stan_sample(m0_3s; data)
if success(rc0_3s)
post0_3s = read_samples(m0_3s, :dataframe)
sums_stan_3 = Int.(mean(Array(post0_1s); dims=1))[1, :]
sums_julia_3 = [sum(x[i, j, :]) for j in 1:n2 for i in 1:n1]
@test sums_stan_3 == sums_julia_3
end
stan0_4 = "
data {
int n1;
int<lower=1> n2;
int<lower=1> n3;
int<lower=1> n4;
array[n1, n2, n3, n4] real x;
}
generated quantities {
array[n1, n2, n3] real mu;
for (i in 1:n1)
for (j in 1:n2)
for (k in 1:n3)
mu[i, j, k] = x[i,j,k,1] + x[i,j,k,2] + x[i,j,k,3] + x[i,j,k,4];
}
";
x = Array(reshape(1:n1*n2*n3*n4, n1, n2, n3, n4))
data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3, "n4" => n4)
m0_4s = SampleModel("m0_4s", stan0_4, tmpdir)
rc0_4s = stan_sample(m0_4s; data)
if success(rc0_4s)
post0_4s = read_samples(m0_4s, :dataframe)
sums_stan_4 = Int.(mean(Array(post0_4s); dims=1))[1, :]
sums_julia_4 = [sum(x[i, j, k, :]) for k in 1:n3 for j in 1:n2 for i in 1:n1]
@test sums_stan_4 == sums_julia_4
end
which I'm adding to the StanSample test set.
I'll try your test script above but I think to Stan array[N, n1, n2] vector[n3] y;
is identical to array[N, n1, n2, n3] y;
I'm getting:
7×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Float64 Float64 Float64 Float64 Int64 DataType
─────┼───────────────────────────────────────────────────────────────────────────────
1 │ mu.1.1.1 1.00011 0.998632 1.00012 1.00163 0 Float64
2 │ mu.1.2.1 9.99936 9.99774 9.99937 10.0009 0 Float64
3 │ mu.1.1.2 1.9995 1.9977 1.99949 2.00175 0 Float64
4 │ mu.1.2.2 19.9993 19.9975 19.9993 20.0008 0 Float64
5 │ mu.1.1.3 3.00083 2.99928 3.00084 3.00253 0 Float64
6 │ mu.1.2.3 29.9998 29.9984 29.9998 30.0016 0 Float64
7 │ sigma 0.0100989 0.00968489 0.0100999 0.0105766 0 Float64
which I think is correct.
I can confirm that things seem to be working correctly with StanBase v.4.7.4. Thanks so much for your quick turnaround with this issue! Enjoy the mountains :).
Hi Andy, is it ok if I add your script as a test case? It's a great example! Thanks for you help, it has bothered me a long time that the JSON stuff was never put to the test.
Hi Rob. Absolutely feel free to add the example as a test case.
Hi Rob,
Thanks for putting work into getting stan and julia playing nice together and for the fix to the above issue. Sorry for raising this but the fix implemented does not seem to generalize to multidimensional arrays. I am using the master branch of StanSample:
StanSample v6.9.2 https://github.com/StanJulia/StanSample.jl#master
. See the modified example below:Returns the following error:
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x; position=0; dims declared=(50,2,27); dims found=(27,2,50)
Originally posted by @AndyPohlNZ in https://github.com/StanJulia/StanSample.jl/issues/51#issuecomment-1197394509