Closed WardBrian closed 9 months ago
Hi Brian,
I'm definitely open to that. From way back I have always felt DataFrames, NamedTuples and Tables are kind of "native" to StanSample. MCMCChains, KeyedArrays. MonteCarloMeasurement and (recently) InferenceObjects are examples of where I choose to go the route of Julia Extensions. The native ones I have always kept as lightweight as possible, maybe because there have always been rumblings about changing the output from Stan at some point in time.
I'm not sure I fully understand what you mean by cmdstan-specific dependence
but that might just be caused by the fact that I'm not a real programmer. A second (available time based) constraint from my side is that over the last 2 years I've focused mainly on causal inference and more recently on RAG/LLM stuff.
Are you thinking to combine both input (NamedTuples -> JSON) and output (CSV -> NamedTuples) in one package? Or maybe I need to dig into or see some examples of what you mean by "more useful objects". Certainly I have never been convinced of what is there now is great, e.g. converting var.1, var.2, etc. to a matrix or stuff like in NestedDataFrames.
Best, Rob
By avoiding CmdStan-specific dependencies I meant it would be nice if I could depend on whatever this new package is and not also pull in all of StanSamples' code dedicating to using CmdStan (which is what would happen if I depended on StanSample for extract
at present). StanSample.jl would of course continue to use CmdStan, just some of the code currently here would live in its own package
Are you thinking to combine both input (NamedTuples -> JSON) and output (CSV -> NamedTuples) in one package? Or maybe I need to dig into or see some examples of what you mean by "more useful objects". Certainly I have never been convinced of what is there now is great, e.g. converting var.1, var.2, etc. to a matrix or stuff like in NestedDataFrames.
Essentially, yes, that would be the idea. The main annoyance is the var.1,var.2
-> Julia array
stuff, though supporting tuples would also be nice at some point.
The dependencies argument makes a lot of sense and I would love to go that route!
And better support for extracting Julia arrays and maybe NamedTuples/tuples is almost a no-brainer.
I wrote most of the utils
code many years ago and I'm convinced having someone like you take a look at it when moving it to a new package would be a huge improvement. If the existing code is a good starting point for this effort, by all means, go ahead.
I certainly will take care of incorporating the new package in StanSample.jl (and in Stan.jl) and removing it from StanSample.jl and where appropriate from StanBase.jl.
Or was your real question if I were willing to separate this code into a separate package with extensions? I wouldn't mind, but I am a bit concerned when I can set aside a few days to do that (including the required updates to StanSample.jl and Stan.jl).
But I'm now convinced it is the right thing to do.
I think it would make the most sense if it lived under the StanJulia organization - I can try to contribute enhancements like tuples myself, though
Ok, I'll have a go at splitting it off in s new StanJulia package StanIO.jl.
Any feedback later on from your side is highly appreciated!
Hi Brian, can't believe this is already a week old!
I decided to first update the different I/O methods before splitting the methods off in a package StanIO.jl.
Quite a bit of time has gone into trying to do this kind of ok, to no avail. I'm currently merging a version of StanSample.jl with a modified namedtuples.jl in the utils directory. A test case is in test/test_tuples/test_tuples_02.jl.
The Stan Language program generates 3 tuples, bar, bar2 and bar3. The tuple bar3 is nested:
(value, (value, (value, value)))
To do this properly I need to return it as a tuple but I haven't been able to figure out how to do this.
Thus Stan produces these names:
"r"
"x.1.1"
"x.2.1"
"x.1.2"
"x.2.2"
"x.1.3"
"x.2.3"
"bar:1"
"bar:2"
"bar2:1"
"bar2:2:1"
"bar2:2:2"
"bar3:1"
"bar3:2:1"
"bar3:2:2:1"
"bar3:2:2:2"
in the .csv files.
I create a group_map:
Dict{Symbol, Array}(:bar => Any[(8, [1]), (9, [2])], :bar2 => Any[(10, [1]), (11, [2, 1]), (12, [2, 2])], :bar3 => Any[(13, [1]), (14, [2, 1]), (15, [2, 2, 1]), (16, [2, 2, 2])], :x => Any[(2, [1, 1]), (3, [2, 1]), (4, [1, 2]), (5, [2, 2]), (6, [1, 3]), (7, [2, 3])])
I got stuck in recreating a Julia tuple for e.g. bar2 and bar3.
Not sure if above info helps, but I wonder if you are aware if someone has solved this. Have you looked at it?
Neither am I sure if you sometimes work in/with Julia. I've mainly been experimenting in a Julia Pluto notebook.
I also need to go back to Seth to conform if InferenceObjects.jl can handle such a tuple.
Currently I simply return tuples as a flattened single row matrix.
Just for completeness. I just pushed a few new Pluto notebooks to StanExampleNotebooks/Testing on Github.
This notebook shows what I was able to do so far.
It kind of parses:
gm = Dict{Symbol, Array}(:bar => Any[(8, [1]), (9, [2])], :bar2 => Any[(10, [1]), (11, [2, 1]), (12, [2, 2]), (13, [3])], :bar3 => Any[(14, [1]), (15, [2, 1]), (16, [2, 2, 1]), (17, [2, 2, 2]), (18, [3])], :x => Any[(2, [1, 1]), (3, [2, 1]), (4, [1, 2]), (5, [2, 2]), (6, [1, 3]), (7, [2, 3])]);
into:
(14, (15, (16, 17)), 18
but uses eval(Meta.parse(str))
.
I solved it in Python by building up an auxiliary representation. With tuples in the most general case, it's hard to do this without recursion somewhere, especially if the elements of the tuples are things like matrices which need their own reshaping.
Even without tuple support this would still be useful to me, as handling something like array[2] complex_matrix[4,5]
in Stan is already some non-trivial wrangling
Hi Brian,
I've uploaded a very initial version of StanIO.jl. Not yet registered. In the test
directory are several generate...
scripts that can generate Stan .csv files if StanSample is added to the test package set in Project.toml. By the way, I noticed that your tuple example does not build on my system (Mac M2). It looks like stans
does its job, but the subsequent compilation fails. I need to take a closer look at generate_brian_tuples.jl
.
In this version I attempted to handle rectangular arrays of variables, including Complex variables, in Stan .csv files. I seems to work for your rectangular data (see the rectangles.jl notebook).
For now I've done this for 3 output_formats, :dataframes, :dataframe and :nesteddataframe. In the notebooks arrays.jl
and rectangles.jl
I show how to convert these to NamedTuples.
I'm starting to think a bit about the tuples stuff. Particularly mixing tuples and arrays in names.
Rob
Awesome, I appreciate you running with this idea!
For what I'm currently hoping to do, rectangular is good enough. It makes sense to have tuples support eventually, since Stan does, but it's also a lot of effort for a currently marginal feature
Hi Brian,
After experimenting a bit with ways to do this, I took a look at reshape.py and preferred your approach over what I had. And keeping a similar approach might also help with future maintenance.
I pushed a preliminary version of a Julia version of reshape.py (in reshape.jl) and it appears to work quite well. I clearly need to add more tests and documentation to StanIO.jl but wanted to checkin with you first if you would be ok with this solution.
Of course! Going off of reshape.py is what I would have done, but without a ton of Julia experience I wasn't sure if that would be a good idea or work as well. I'm glad you tried some alternatives first, since for me that means we can be much more confident in the choice, having explored a bit.
Thanks again!
Great, thanks a lot! I just merged a version 0.1.2 with slightly improved online docs and some tests to check the nested DataFrame and the created NamedTuple. I don't know Numpy or Panda very well, but I'm pretty happy with how DataFrames.jl displays tuples in the nested dataframes, both in the REPL and using Pluto.
StanIO.jl v0.1... differs somewhat from your approach in the (substantial) reliance on DataFrames and also that I convert Complex values earlier (when reading in .csv files). Both of these simplify the _extract_helper code (a bit at least).
Next on my list is to figure out why I can't compile/run all of your Stan Language constructs using tuples. I've seen issues both on my system and also occasionally on Github CI.
And finally I need to switch to StanIO.jl in StanSample.jl in the read_samples(model, :nesteddataframe)
method.
As always, thanks for your work on this!
Did you ever figure out what the issue was with some of my test programs?
If I simply try to run StanSample.jl with a shortened version of your tuple Stan Language model:
generated quantities {
real base = normal_rng(0, 1);
real basep1 = base + 1, basep2 = base + 2;
real basep3 = base + 3, basep4 = base + 4, basep5 = base + 5;
array[2,3] tuple(array[2] tuple(real, vector[2]), matrix[4,5]) ultimate =
{
{(
{(base, [base *2, base *3]'), (base *4, [base*5, base*6]')},
to_matrix(linspaced_vector(20, 7, 11), 4, 5) * base
),
(
{(basep1, [basep1 *2, basep1 *3]'), (basep1 *4, [basep1*5, basep1*6]')},
to_matrix(linspaced_vector(20, 7, 11), 4, 5) * basep1
),
(
{(basep2, [basep2 *2, basep2 *3]'), (basep2 *4, [basep2*5, basep2*6]')},
to_matrix(linspaced_vector(20, 7, 11), 4, 5) * basep2
)
},
{(
{(basep3, [basep3 *2, basep3 *3]'), (basep3 *4, [basep3*5, basep3*6]')},
to_matrix(linspaced_vector(20, 7, 11), 4, 5) * basep3
),
(
{(basep4, [basep4 *2, basep4 *3]'), (basep4 *4, [basep4*5, basep4*6]')},
to_matrix(linspaced_vector(20, 7, 11), 4, 5) * basep4
),
(
{(basep5, [basep5 *2, basep5 *3]'), (basep5 *4, [basep5*5, basep5*6]')},
to_matrix(linspaced_vector(20, 7, 11), 4, 5) * basep5
)
}};
}
It will fail during compilation of the generated C++ code after executing stanc
. On my machine it fails complaining that some variables will no longer exist when used/needed. If that helps, I can rerun your SL model end send you the reported errors. This is on MacOS using Clang++.
I have reformulated this part in e.g. generate_mixed_02.jl
to further work on handling mixed nested types (primarily what you handle in dtype()
and extract_header()
), but I'm not convinced I figured that out yet.
Yes, the reported errors (or just a full dump of the console) would be very helpful!
Her you go:
julia> include("/Users/rob/.julia/dev/StanIO/generate_test_data/generate_brian_data.jl");
[ Info: /Users/rob/.julia/dev/StanIO/data/brian_data/brian_data.stan updated.
ERROR: LoadError:
Error when compiling SampleModel brian_data:
In file included from /Users/rob/.julia/dev/StanIO/data/brian_data/brian_data.hpp:1:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/src/stan/model/model_header.hpp:4:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math.hpp:19:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/rev.hpp:4:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/eigen_3.4.0/Eigen/Dense:1:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/eigen_3.4.0/Eigen/Core:50:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/complex:243:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/sstream:191:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/istream:165:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/ostream:170:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/bitset:131:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/string:560:
In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory_resource/polymorphic_allocator.h:20:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/tuple:345:24: error: reference member '__value_' binds to a temporary object whose lifetime would be shorter than the lifetime of the constructed object
: __value_(_VSTD::forward<_Tp>(__t))
^~~~~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:715:17: note: expanded from macro '_VSTD'
# define _VSTD std
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/tuple:495:13: note: in instantiation of function template specialization 'std::__tuple_leaf<1, const Eigen::Matrix<double, -1, -1> &>::__tuple_leaf<Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double>, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1>>, const Eigen::Matrix<double, -1, -1>>, void>' requested here
__tuple_leaf<_Uf, _Tf>(_VSTD::forward<_Up>(__u))...,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/tuple:723:11: note: in instantiation of function template specialization 'std::__tuple_impl<std::__tuple_indices<0, 1>, std::vector<std::tuple<double, Eigen::Matrix<double, -1, 1>>>, const Eigen::Matrix<double, -1, -1> &>::__tuple_impl<0UL, 1UL, std::vector<std::tuple<double, Eigen::Matrix<double, -1, 1>>>, const Eigen::Matrix<double, -1, -1> &, std::vector<std::tuple<double, Eigen::Matrix<double, -1, 1>>>, Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double>, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1>>, const Eigen::Matrix<double, -1, -1>>>' requested here
: __base_(typename __make_tuple_indices<sizeof...(_Up)>::type(),
^
/Users/rob/.julia/dev/StanIO/data/brian_data/brian_data.hpp:305:48: note: in instantiation of function template specialization 'std::tuple<std::vector<std::tuple<double, Eigen::Matrix<double, -1, 1>>>, const Eigen::Matrix<double, -1, -1> &>::tuple<std::vector<std::tuple<double, Eigen::Matrix<double, -1, 1>>>, Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double>, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1>>, const Eigen::Matrix<double, -1, -1>>, 0>' requested here
std::tuple<
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/tuple:295:9: note: reference member declared here
_Hp __value_;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/tuple:346:10: error: static assertion failed due to requirement '__can_bind_reference<Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double, double>, const Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<double>, const Eigen::Matrix<double, -1, -1, 0, -1, -1>>, const Eigen::Matrix<double, -1, -1, 0, -1, -1>> &&>()': Attempted construction of reference element binds to a temporary whose lifetime has ended
{static_assert(__can_bind_reference<_Tp&&>(),
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /Users/rob/.julia/dev/StanIO/data/brian_data/brian_data.hpp:1:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/src/stan/model/model_header.hpp:4:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math.hpp:19:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/rev.hpp:10:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/rev/fun.hpp:200:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/prim/functor.hpp:16:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/prim/functor/integrate_ode_rk45.hpp:6:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/stan/math/prim/functor/ode_rk45.hpp:9:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint.hpp:76:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/numeric/odeint/integrate/observer_collection.hpp:23:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/function.hpp:30:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/function/detail/prologue.hpp:17:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/function/function_base.hpp:21:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/type_index.hpp:29:
In file included from /Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/type_index/stl_type_index.hpp:47:
/Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/container_hash/hash.hpp:132:33: warning: 'unary_function<const std::error_category *, unsigned long>' is deprecated [-Wdeprecated-declarations]
struct hash_base : std::unary_function<T, std::size_t> {};
^
/Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/container_hash/hash.hpp:692:18: note: in instantiation of template class 'boost::hash_detail::hash_base<const std::error_category *>' requested here
: public boost::hash_detail::hash_base<T*>
^
/Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/container_hash/hash.hpp:420:24: note: in instantiation of template class 'boost::hash<const std::error_category *>' requested here
boost::hash<T> hasher;
^
/Users/rob/Projects/StanSupport/cmdstan/stan/lib/stan_math/lib/boost_1.78.0/boost/container_hash/hash.hpp:551:9: note: in instantiation of function template specialization 'boost::hash_combine<const std::error_category *>' requested here
hash_combine(seed, &v.category());
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__functional/unary_function.h:23:29: note: 'unary_function<const std::error_category *, unsigned long>' has been explicitly marked deprecated here
struct _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX11 unary_function
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:850:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX11'
# define _LIBCPP_DEPRECATED_IN_CXX11 _LIBCPP_DEPRECATED
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__config:835:49: note: expanded from macro '_LIBCPP_DEPRECATED'
# define _LIBCPP_DEPRECATED __attribute__((deprecated))
^
1 warning and 2 errors generated.
make: *** [/Users/rob/.julia/dev/StanIO/data/brian_data/brian_data.o] Error 1
Stacktrace:
[1] SampleModel(name::String, model::String, tmpdir::String)
@ StanSample ~/.julia/dev/StanSample/src/stanmodel/SampleModel.jl:125
[2] top-level scope
@ ~/.julia/dev/StanIO/generate_test_data/generate_brian_data.jl:55
[3] include(fname::String)
@ Base.MainInclude ./client.jl:489
[4] top-level scope
@ REPL[1]:1
in expression starting at /Users/rob/.julia/dev/StanIO/generate_test_data/generate_brian_data.jl:55
julia>
Thank you! Can you also share your xcode/clang versions?
It's Xcode Version 15.0.1 (15A507).
rob@Rob-M2 StanSample % clang++ --version Apple clang version 15.0.0 (clang-1500.0.40.1) Target: arm64-apple-darwin23.2.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
And cmdstan:
# stan_version_major = 2
# stan_version_minor = 33
# stan_version_patch = 0
# model = bernoulli_model
# start_datetime = 2023-11-03 13:45:43 UTC
# method = diagnose
# diagnose
# test = gradient (Default)
# gradient
# epsilon = 0.000001 (Default)
# error = 0.000001 (Default)
# id = 4
# data
# file = /Users/rob/Projects/StanSupport/cmdstan/bernoulli_data_4.json
# init = 2 (Default)
# random
# seed = 2627225882 (Default)
# output
# file = /Users/rob/Projects/StanSupport/cmdstan/bernoulli_chain_4.csv
# diagnostic_file = (Default)
# refresh = 100 (Default)
# sig_figs = -1 (Default)
# profile_file = profile.csv (Default)
# save_cmdstan_config = 0 (Default)
# num_threads = 1 (Default)
# stanc_version = stanc3 v2.33.0-rc1-74-g10b8ef0
# stancflags = --warn-pedantic
#
# Log probability=-9.36881
#
# param idx value model finite diff error
# 0 -1.93901 2.49093 2.49093 1.04896e-09
I don’t think cmdstan 2.33.1’s changes touched anything that would affect that model, but it’s worth a shot updating?
I will try to have someone volunteer me a ARM based Mac to do some testing on. That model compiles for me on my Linux box and on a Intel Mac in GitHub Actions, so it seems pretty specific to that platform
Do the .csv files report v 2.33.1 for you? Just updated and it still shows:
# stan_version_major = 2
# stan_version_minor = 33
# stan_version_patch = 0
# model = bernoulli_model
# start_datetime = 2023-12-07 22:31:54 UTC
and later on for stanc3:
output
# file = output.csv (Default)
# diagnostic_file = (Default)
# refresh = 100 (Default)
# sig_figs = -1 (Default)
# profile_file = profile.csv (Default)
# save_cmdstan_config = 0 (Default)
# num_threads = 6
# stanc_version = stanc3 v2.33.0-rc1-93-g992ff90
# stancflags = --warn-pedantic
Huh, the cmdstan stan_version_patch
is still 0, but my stanc version is different from yours:
# stanc_version = stanc3 v2.33.1
I was able to re-create the issue on a Mac so I'll look into it.
Hmm, I had noticed the rc1
in the stanc3
version. Not sure where I pick that version up. I simply use:
git clone https://github.com/stan-dev/cmdstan.git --recursive cmdstan
with the directory cmdstan
non-existent. Is to possible somehow you pick a local version up?
But it's great you were able to recreate the issue! An apple-silicon machine?
I also have a question about the code in extract_helper()
. If you have a variable like u.2.3:1.2:4.5
will it handle dummy_4.5
correct in python as the reshape takes place in extract_helper()
, not in _extract_helper()
.
But it's great you were able to recreate the issue! An apple-silicon machine?
I actually now have it re-created on Linux, it seems to be specific to something happening with clang++
I also have a question about the code in
extract_helper()
. If you have a variable likeu.2.3:1.2:4.5
will it handledummy_4.5
correct in python as the reshape takes place inextract_helper()
, not in_extract_helper()
.
Not sure I follow - could you elaborate?
It seems I'm settling on below version of extract_helper()
:
function extract_helper(v::Variable, df::DataFrame, offset=0; object=true)
out = _extract_helper(v, df)
if v.type == TUPLE
if v.type == TUPLE
atr = []
elts = [p -> p.dimensions == () ? (1,) : p.dimensions for p in v.contents]
for j in 1:length(out) # Typically 4000 in StanSample.jl
at = Tuple[]
for i in 1:length(elts):length(out[j])
append!(at, [(out[j][i], out[j][i+1],)])
end
append!(atr, [reshape(at, v.dimensions...)])
end
return atr
end
else
return out
end
end
and I was worried if this would ever be executed for 2nd level tuples (the part after the second ':' in above u. But it looks like it does as it seems to work in mixed_02_notebook.jl
for the Variable u.
It seems you are doing some bookkeeping with the argument top=[True | False]
that I don't seem to need. Makes me suspicious I'm missing something. I'll start adding a bunch of tests to check that.
The top=[True|False]
and object=[True|False]
arguments in my reshape.py are both to handle some specifics of Numpy, so I'm not surprised they're absent in Julia.
I think I found/fixed the issue in https://github.com/stan-dev/stanc3/pull/1382. In the mean time, did you try using the output csvs I had checked in to the stanio python repo? Those should tell you if you've caught the really nasty cases, even if you couldn't run the models with the current release
Not yet, on my list ... . It reads in something reasonable with the version I just merged (StanIO.jl v0.2.0).
It's pretty interesting to figure out how a tuple in the Stan Language with braces, square brackets and round brackets will look in Julia :-).
If you pull the latest nightly stanc3 that issue should be resolved
Thanks Brian, will have a go. As Apple/Xcode upgraded Clang on Dec 11 (to clang-1500.1.0.2.5), I wanted to check against that version first.
As expected clang-1500.1.0.2.5 doesn't solve the problem.
Is there a way to pull in the new stanc after cloning cmdstan?
If you're using cmdstan from git (e.g., not a release tarball), then nightly stancs should be downloaded if you make clean-all; make build
I must be using the wrong http address in git clone https://github.com/stan-dev/cmdstan --recursive cmdstan
, as even after make clean-all
, it shows:
bin/stanc --version
stanc3 v2.33.0-rc1-104-g6de5732 (Unix)
and also in the .csv files:
# stanc_version = stanc3 v2.33.0-rc1-104-g6de5732
# stancflags = --warn-pedantic
I've also tried 'make clean-librariesand
make stan-update`, they seem to run but show little output. The first command does clean sundial and TBB libraries.
104-g6de5732
looks like it is indeed the most recent. I'm not sure why the nightlies never picked up the 2.33.1 tag
Aaah, it actually works! Got stuck on the ...33.0-rc1-...
. Thanks for your help!
How feature-complete would you say StanIO.jl is at this point in time?
For reading Stan's .csv files I think it is completed. The tests cover all your tests. The same functionality is available in StanSample.jl when using :nesteddataframe in the call to read_samples().
No extra work has been done on the input side, but I don't think anything is missing there. It is up to date with the recent changes for array and vector etc. Over the years a lot of folks have pointed out small issues that were relatively simple to fix in the input side.
Using StanIO.jl in StanSample.jl is a bit trickier as the approach (workflow) taken in StanJulia is based around SampleModel, OptimizeModel, etc.
I'm currently redoing some of the ARM notebooks I did 10 years or so ago (with Julia at v0.3.0!) using more up to date usage of the Stan Language and significantly updated Julia packages like DataFrames with grouped dataframes and above mentioned :nesteddataframe format option. It hasn't shown any issues until now and is certainly an improvement (in my opinion at least). Slow process, Regression And Other Stories, Statistical Rethinking and the upcoming AARM will also need attention!
@WardBrian Hi Brian, is there something else you expected? Should we close this and continue in StanIO.jl if there is more work needed?
I haven’t had the chance to try it out myself unfortunately, but I agree any further discussion should live there. I’ll let you know if anything comes up.
Thanks again!
It would be useful for other Stan-based tools in the Julia ecosystem (e.g., bridgestan) to have a shared way of extracting/reshaping the flat parameter vectors provided by Stan into more useful objects.
Recently I started a project to do this for the Python tools: https://github.com/WardBrian/stanio. As part of this I also made the changes necessary to handle tuples as JSON inputs and CSV outputs.
I think the relevant code here is https://github.com/StanJulia/StanSample.jl/blob/master/src/utils/namedtuples.jl and the
ext/
folder. How would you feel about moving this code to a package without any cmdstan-specific dependence?