Closed sethaxen closed 8 months ago
Hi Seth, thanks for your heads-up!
This is what I see when I run your program in Stan.jl. Will try a few more things to see if I can reproduce the issue. Below DataFrames have 4000 rows, it seems all 4 chains are ok.
Your example program:
using StanSample
stan_data = Dict("N" => 3, "nu" => 13, "L_Psi" => [1.0 0.0 0.0; 2.0 3.0 0.0; 4.0 5.0 6.0]);
model_code = "
data {
int<lower=1> N;
real<lower=N-1> nu;
cholesky_factor_cov[N] L_Psi;
}
parameters {
cholesky_factor_cov[N] L_X;
}
model {
L_X ~ inv_wishart_cholesky(nu, L_Psi);
}
";
sm = SampleModel("test", model_code);
stan_sample(sm; data=stan_data, num_samples=1_000);
df = read_samples(sm, :dataframe)
ndf = read_samples(sm, :nesteddataframe)
Running the program:
julia> include("/Users/rob/.julia/dev/Stan/test/Examples-Test-Cases/Cholesky_factor_cov/test_cholesky_factor_cov.jl");
[ Info: /var/folders/pf/2m__rnm54153mj3198b5xxn00000gn/T/jl_1tEr5R/test.stan updated.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: inv_wishart_cholesky_lpdf: Cholesky Random variable[3] is 0, but must be positive! (in '/var/folders/pf/2m__rnm54153mj3198b5xxn00000gn/T/jl_1tEr5R/test.stan', line 10, column 4 to column 42)Exception: inv_wishart_cholesky_lpdf: Cholesky Random variable[2] is 0, but must be positive! (in '/var/folders/pf/2m__rnm54153mj3198b5xxn00000gn/T/jl_1tEr5R/test.stan', line 10, column 4 to column 42)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Exception: inv_wishart_cholesky_lpdf: Cholesky Random variable[1] is 0, but must be positive! (in '/var/folders/pf/2m__rnm54153mj3198b5xxn00000gn/T/jl_1tEr5R/test.stan', line 10, column 4 to column 42)but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Output (abbreviated):
julia> df
4000×9 DataFrame
Row │ L_X.1.1 L_X.2.1 L_X.3.1 L_X.1.2 L_X.2.2 L_X.3.2 L_X.1.3 L_X.2.3 L_X.3.3
│ Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
──────┼───────────────────────────────────────────────────────────────────────────────────────────
1 │ 0.244655 0.588676 0.240974 0.0 0.717211 1.65515 0.0 0.0 3.22123
2 │ 0.24772 0.197201 0.528767 0.0 0.787673 1.59326 0.0 0.0 2.91718
3 │ 0.37692 0.982172 1.86149 0.0 0.82456 1.69881 0.0 0.0 1.06862
4 │ 0.256319 0.354241 0.719426 0.0 0.786066 0.838907 0.0 0.0 2.3652
5 │ 0.295837 0.686061 1.04639 0.0 0.736212 1.35634 0.0 0.0 2.49514
6 │ 0.269881 0.525568 0.836186 0.0 0.772283 1.63265 0.0 0.0 1.79331
7 │ 0.257858 0.403668 0.858547 0.0 0.83223 1.05567 0.0 0.0 1.74217
8 │ 0.388772 0.809176 1.71386 0.0 0.994925 2.10525 0.0 0.0 1.54005
9 │ 0.300623 0.359805 0.156597 0.0 0.706201 1.0627 0.0 0.0 1.39504
10 │ 0.336903 0.165934 -0.0745253 0.0 0.766276 1.42984 0.0 0.0 1.48966
and:
julia> ndf
4000×1 DataFrame
Row │ L_X
│ Array…
──────┼───────────────────────────────────
1 │ [0.244655 0.0 0.0; 0.588676 0.71…
2 │ [0.24772 0.0 0.0; 0.197201 0.787…
3 │ [0.37692 0.0 0.0; 0.982172 0.824…
4 │ [0.256319 0.0 0.0; 0.354241 0.78…
5 │ [0.295837 0.0 0.0; 0.686061 0.73…
6 │ [0.269881 0.0 0.0; 0.525568 0.77…
7 │ [0.257858 0.0 0.0; 0.403668 0.83…
8 │ [0.388772 0.0 0.0; 0.809176 0.99…
9 │ [0.300623 0.0 0.0; 0.359805 0.70…
10 │ [0.336903 0.0 0.0; 0.165934 0.76…
My setup on MacOS:
julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M2
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 11 on 4 virtual cores
Environment:
JULIA_NUM_THREADS = 8
JULIA_EDITOR = subl
JULIA_SPECIALFUNCTIONS_BUILD_SOURCE = true
JULIA_MPI_PATH = /usr/local/Cellar/open-mpi/4.0.2
JULIA_PKG3_PRECOMPILE = true
JULIA_PKG_SERVER = pkg.julialang.org
JULIA_SR_HOME = /Users/rob/.julia/dev/StatisticalRethinking/src
JULIA_ROS_HOME = /Users/rob/Projects/R/ROS-Examples
JULIA_MIXTAPE_HOME = /Users/rob/Projects/R/mixtape
JULIA_WOOLDRIDGE_HOME = /Users/rob/Projects/R/wooldridge
and:
(Stan) pkg> st
Project Stan v10.5.0
Status `~/.julia/dev/Stan/Project.toml`
[94b1ba4f] AxisKeys v0.2.13
⌃ [336ed68f] CSV v0.10.11
[5224ae11] CompatHelperLocal v0.1.25
[a93c6f00] DataFrames v1.6.1
⌃ [864edb3b] DataStructures v0.18.15
[0703355e] DimensionalData v0.25.8
⌃ [31c24e10] Distributions v0.25.104
[ffbed154] DocStringExtensions v0.9.3
[b5cf5a8d] InferenceObjects v0.3.15
[682c06a0] JSON v0.21.4
[0f8b85d8] JSON3 v1.14.0
[c7f686f2] MCMCChains v6.0.4
[0987c9cc] MonteCarloMeasurements v1.1.6
[86f7a689] NamedArrays v0.10.0
[d9ec5142] NamedTupleTools v0.14.3
[bac558e1] OrderedCollections v1.6.3
[1c4bc282] PosteriorDB v0.5.0
[d0ee94f6] StanBase v4.9.0 `~/.julia/dev/StanBase`
[fb13fc95] StanDiagnose v4.5.0 `~/.julia/dev/StanDiagnose`
[fbd8da12] StanOptimize v4.4.0 `~/.julia/dev/StanOptimize`
[e4723793] StanQuap v4.3.0 `~/.julia/dev/StanQuap`
[c1514b29] StanSample v7.6.0 `~/.julia/dev/StanSample`
[6ede68ce] StanVariational v4.4.0 `~/.julia/dev/StanVariational`
[2913bbd2] StatsBase v0.34.2
[4c63d2b9] StatsFuns v1.3.0
[bd369af6] Tables v1.11.1
[9a3f8284] Random
[10745b16] Statistics v1.10.0
[8dfed614] Test
Info Packages marked with ⌃ have new versions available and may be upgradable.
@goedman, yes, I see the same with your example. But if I add a 2nd stan_sample(sm; data=stan_data, num_samples=1_000);
call at the end, that's when I get the error. Trying a third time also succeeds. Then the 4th fails, and so on.
I forgot below quick check:
julia> ndf.L_X[1]
3×3 Matrix{Float64}:
0.339434 0.0 0.0
0.650364 1.20946 0.0
1.94848 2.07423 1.61524
But that looks ok.
Aah, let me try that!
Yip, same problem here! Will have a look, seems to trigger an old memory about this issue ...
@sethaxen Hmmm, very weird. Haven't made much progress except that below version seems to work:
using StanSample
stan_data = Dict("N" => 3, "nu" => 13, "L_Psi" => [1.0 0.0 0.0; 2.0 3.0 0.0; 4.0 5.0 6.0]);
model_code = "
data {
int<lower=1> N;
real<lower=N-1> nu;
cholesky_factor_cov[N] L_Psi;
}
parameters {
cholesky_factor_cov[N] L_X;
}
model {
L_X ~ inv_wishart_cholesky(nu, L_Psi);
}
";
tmpdir = joinpath(pwd(), "test", "test_cholesky_factor_cov", "tmp")
sm = SampleModel("test", model_code, tmpdir);
rc = stan_sample(sm; data=stan_data);
if success(rc)
df = read_samples(sm, :dataframe)
ndf = read_samples(sm, :nesteddataframe)
display(ndf.L_X[1])
println()
end
for j in 1:5
run(pipeline(StanSample.par(sm.cmds), stdout=sm.log_file[1]));
ndf = read_samples(sm, :nesteddataframe)
display(ndf.L_X[1])
println()
end
Which is 90% identical to what happens in stan_sample(). Will dig further.
If you need this, above construct might be a temporary work around.
Hi Seth ( @sethaxen ),
Turns out a better work around for now is to use:
stan_data = (N = 3, nu = 13, L_Psi = [1.0 0.0 0.0; 2.0 3.0 0.0; 4.0 5.0 6.0]);
instead of a Dict. Will figure out what's wrong in handling Dicts!
@sethaxen
Found the problem. StanSample.jl v7.7.0 (just merged) should fix this.
@goedman thanks for the fix! What was it? The only change I see in v7.7.0 is that Distributed is no longer loaded.
@sethaxen Hmm, a bit embarrassing. It took me quite a while to figure out what caused this bug. At some point I even started to doubt how StanSample uses multiple cores and for that reason I briefly switched to using Distributed.jl, but it's not really needed in StanSample.jl, so I removed it again.
The real issue was introduced when I switched to JSON input files. For cmdstan input files I need to permute dimensions of arrays and used a very poor construct:
function convert_matrices(d::Union{Dict, NamedTuple})
dct = typeof(d) == NamedTuple ? convert(Dict, d) : d
for key in keys(dct)
if typeof(dct[key]) <: Array
dct[key] = permutedims(dct[key], length(size(dct[key])):-1:1)
end
end
dct
end
Of course, formulated without an explicit deepcopy(d) (if d is a Dict), this will cause the behavior you noticed.
I should have paid attention to that old memory I mentioned above!
Thanks again for your help! On a side note, I found it very interesting to see PosterorDB show up in the testing of Stan's new Pathfinder method! StanPathfinder.jl should be merged later this week (I hope, just waiting the 3 days as it is a new package). By the way, StanIO.jl is also done. In StanSample.jl that approach is used in output_format :nesteddataframe
.
Thanks to Brian Ward!
Ah, okay, makes sense! Thanks again for the fix!
Thanks again for your help! On a side note, I found it very interesting to see PosterorDB show up in the testing of Stan's new Pathfinder method! StanPathfinder.jl should be merged later this week (I hope, just waiting the 3 days as it is a new package).
Ah, nice, thanks for adding this! There's also Pathfinder.jl, which supports Stan models via StanLogDensityProblems.jl. Having a StanPathfinder will allow benchmark comparisons between the two (for another time).
By the way, StanIO.jl is also done. In StanSample.jl that approach is used in output_format
:nesteddataframe
.
Thanks, I'll take a look when I next get back to thinking about conversions to InferenceData
.
Hi Seth, thanks for the link to Pathfinder.jl. Just FYI, I did add a StanPathfinder.jl a few weeks ago.
Here
L_Psi
is a valid lower Cholesky factor. If we annotate it as acholesky_factor_cov
and sample, everything works fine the 1st time and every odd time but errors every even time:The error seems to indicate that the upper triangle is being filled with the entries in the lower triangle. I would normally thing this was a cmdstan issue, but it runs fine in cmdstanr.
Environment