Open alashworth opened 5 years ago
Comment by betanalpha Wednesday Dec 10, 2014 at 18:04 GMT
The way the models are abstracted from the sampling code I can’t think of any way to get at this information any time log_prob is called.
On Dec 10, 2014, at 1:01 PM, Bob Carpenter notifications@github.com wrote:
Sean O'Riordain suggested on stan-users that it would be useful to be able to access the iteration number in a Stan program so that you could write something like this to print theta every 100th iteration:
if (mod(iteration.count,100) == 0) print(theta); It would also be nice to get the chain_id in case things are running in parallel.
This brings up the issue of what the values should be when, for instance, we're running diagnostics, just evaluating the log probability function, running optimization, etc. Maybe just -1 values everywhere?
There is also the issue of which blocks this should work in. Should prints in the transformed data block get a value of iteration=0 as the value? The chain_id would still work.
— Reply to this email directly or view it on GitHub.
Comment by bob-carpenter Wednesday Dec 10, 2014 at 18:07 GMT
It would need to be a separate argument passed through.
I'm torn, because on the one hand, I agree that it'd be useful to have on the model, but on the other hand, I agree it breaks the standalone log density abstraction.
Given that it would also complicate the code, which is always a big negative, I haven't brought this up before (though I've thought about it many times for exactly the same reason that Sean O'Riordain brought up).
On Dec 10, 2014, at 1:04 PM, Michael Betancourt notifications@github.com wrote:
The way the models are abstracted from the sampling code I can’t think of any way to get at this information any time log_prob is called.
On Dec 10, 2014, at 1:01 PM, Bob Carpenter notifications@github.com wrote:
Sean O'Riordain suggested on stan-users that it would be useful to be able to access the iteration number in a Stan program so that you could write something like this to print theta every 100th iteration:
if (mod(iteration.count,100) == 0) print(theta); It would also be nice to get the chain_id in case things are running in parallel.
This brings up the issue of what the values should be when, for instance, we're running diagnostics, just evaluating the log probability function, running optimization, etc. Maybe just -1 values everywhere?
There is also the issue of which blocks this should work in. Should prints in the transformed data block get a value of iteration=0 as the value? The chain_id would still work.
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub.
Comment by bob-carpenter Saturday Feb 07, 2015 at 17:30 GMT
If we did do this, we'd probably want to do it via functions:
int meta_iteration_num();
int meta_chain_id();
int meta_is_warming_up();
int meta_is_sampling();
I'd want to print a warning with any use of such a meta function at the very least. And probably call it something other than "meta", which tends to get overused.
Comment by bob-carpenter Monday Feb 09, 2015 at 22:08 GMT
Joshua N Pritikin points out on stan-dev:
Maybe a mcmc_ prefit would be more descriptive since Stan also does BGFS-esque optimization.
Comment by bob-carpenter Tuesday Aug 04, 2015 at 05:36 GMT
If iteration number is really just for print control every N-th iteration, we could control that from the outside by passing in a NULL ostream.
And maybe we should always print the chain ID before any printed output?
Comment by peleschramm Saturday May 20, 2017 at 01:35 GMT
I would also like access to iteration number, but for hacking purposes (some Simulated Annealing may be possible this way, for example).
For just printing values, the solution I use is to have Stan periodically update the output csv file (including warmup), and run a script that periodically reads the csv file and plots the samples for whatever I'm interested in. That way you can view the entire trajectory as stan samples. If using matlabstan, the function "mstan.read_stan_csv" is helpful for this.
Comment by bob-carpenter Saturday May 20, 2017 at 19:05 GMT
I just looked back through this issue. We've gone back and forth on whether we should allow this.
@betanalpha and I don't like it because it breaks the nice abstraction of an instantiated model as an immutable log density function. Making things immutable is powerful for reasoning about program behavior and writing correct code.
We could break this abstraction by either (1) storing iteration and chain number in special mutable variables, or (2) treating them like data and allowing data to be non-constant (mutable) in general. I'm inclined toward the latter because we may want to do this for data streaming or parallel algorithms like stochastic gradient descent for optimization (penalized MLE) or variational inference (VB) or data-parallel expectation propagation (EP) or gradient-based marginal optimization (penalized MML).
Comment by syclik Saturday May 20, 2017 at 23:26 GMT
Add one more vote for no iteration number within the language.
If you're really inclined, there are ways to use that information from an algorithm written in C++. You might also be able to wrap the generated model class with another class that has a mutable iteration number.
If you're able to demonstrate a real need for that in the language with a model and an algorithm that utilizes it, I'd reconsider. But burden of proof is on a prototype and a real use case before we break immutability.
On May 20, 2017, at 3:05 PM, Bob Carpenter notifications@github.com wrote:
I just looked back through this issue. We've gone back and forth on whether we should allow this.
@betanalpha and I don't like it because it breaks the nice abstraction of an instantiated model as an immutable log density function. Making things immutable is powerful for reasoning about program behavior and writing correct code.
We could break this abstraction by either (1) storing iteration and chain number in special mutable variables, or (2) treating them like data and allowing data to be non-constant (mutable) in general. I'm inclined toward the latter because we may want to do this for data streaming or parallel algorithms like stochastic gradient descent for optimization (penalized MLE) or variational inference (VB) or data-parallel expectation propagation (EP) or gradient-based marginal optimization (penalized MML).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Comment by bob-carpenter Monday May 22, 2017 at 16:08 GMT
On May 20, 2017, at 7:26 PM, Daniel Lee notifications@github.com wrote:
Add one more vote for no iteration number within the language.
If you're really inclined, there are ways to use that information from an algorithm written in C++.
You can manipulate the entire log density at this point, but can't get inside it.
You might also be able to wrap the generated model class with another class that has a mutable iteration number.
Same problem---won't be able to access it in the language unless you hack a function into Stan which returns a static that the algorithm sets.
If you're able to demonstrate a real need
The first suggested usage in this issue is printing:
if (iteration() % 100 == 0) print(...);
The second suggestion was to do some kind of annealing; I could see that wanting to do different things to the likelihood and prior in a Bayesian setting.
for that in the language with a model and an algorithm that utilizes it, I'd reconsider. But burden of proof is on a prototype and a real use case before we break immutability.
The problem with requesting a prototype is that this is a user-level request, but building a prototype is a developer-level problem. There's way to build a prototype with user-facing tools we have now.
Comment by syclik Monday May 22, 2017 at 16:22 GMT
Good points. Especially about the prototype and it's a developer-level problem.
If it's just about printing, it seems like a lot of effort. Probably worth it in the long run to figure it out, but it's still a lot of effort. The second use case was simulated annealing and controlling a temperature parameter based on iteration? Maybe a little more thought into how that's written out in the language, what sorts of things are propagated as data / parameters or double / vars would help.
On Mon, May 22, 2017 at 12:08 PM, Bob Carpenter notifications@github.com wrote:
On May 20, 2017, at 7:26 PM, Daniel Lee notifications@github.com wrote:
Add one more vote for no iteration number within the language.
If you're really inclined, there are ways to use that information from an algorithm written in C++.
You can manipulate the entire log density at this point, but can't get inside it.
You might also be able to wrap the generated model class with another class that has a mutable iteration number.
Same problem---won't be able to access it in the language unless you hack a function into Stan which returns a static that the algorithm sets.
If you're able to demonstrate a real need
The first suggested usage in this issue is printing:
if (iteration() % 100 == 0) print(...);
The second suggestion was to do some kind of annealing; I could see that wanting to do different things to the likelihood and prior in a Bayesian setting.
for that in the language with a model and an algorithm that utilizes it, I'd reconsider. But burden of proof is on a prototype and a real use case before we break immutability.
The problem with requesting a prototype is that this is a user-level request, but building a prototype is a developer-level problem. There's way to build a prototype with user-facing tools we have now.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/1166#issuecomment-303145723, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ_F1mS45flCLY1LX06dXIVO57d2tofks5r8bLigaJpZM4DGxC2 .
Comment by betanalpha Monday May 22, 2017 at 16:30 GMT
I am unconvinced on printing (what does striped printing by you? You can always manipulate the output directly in any of the environments).
But I am doubly unconvinced on the algorithm side. The issue is that all of these ideas are trying to break the abstraction of the Stan Modeling Language specifying a function proportional to the posterior density and nothing more. Because there is a fundamental prior/likelihood separation, for example, any algorithm that relies on that separation would be an awkward hack.
Stan was not built for algorithm development of this sort. If we wanted to support it then we would have to allow users to specify prior and likelihood separately and then expose various algorithmic manipulations like optimize and Markov transition. I’m not saying that we shouldn’t do that (okay, I would but that’s an orthogonal conversation) just that breaking out current abstraction to make an infinitesimal and fragile step towards that is a bad idea.
On May 22, 2017, at 9:22 AM, Daniel Lee notifications@github.com wrote:
Good points. Especially about the prototype and it's a developer-level problem.
If it's just about printing, it seems like a lot of effort. Probably worth it in the long run to figure it out, but it's still a lot of effort. The second use case was simulated annealing and controlling a temperature parameter based on iteration? Maybe a little more thought into how that's written out in the language, what sorts of things are propagated as data / parameters or double / vars would help.
On Mon, May 22, 2017 at 12:08 PM, Bob Carpenter notifications@github.com wrote:
On May 20, 2017, at 7:26 PM, Daniel Lee notifications@github.com wrote:
Add one more vote for no iteration number within the language.
If you're really inclined, there are ways to use that information from an algorithm written in C++.
You can manipulate the entire log density at this point, but can't get inside it.
You might also be able to wrap the generated model class with another class that has a mutable iteration number.
Same problem---won't be able to access it in the language unless you hack a function into Stan which returns a static that the algorithm sets.
If you're able to demonstrate a real need
The first suggested usage in this issue is printing:
if (iteration() % 100 == 0) print(...);
The second suggestion was to do some kind of annealing; I could see that wanting to do different things to the likelihood and prior in a Bayesian setting.
for that in the language with a model and an algorithm that utilizes it, I'd reconsider. But burden of proof is on a prototype and a real use case before we break immutability.
The problem with requesting a prototype is that this is a user-level request, but building a prototype is a developer-level problem. There's way to build a prototype with user-facing tools we have now.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/1166#issuecomment-303145723, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZ_F1mS45flCLY1LX06dXIVO57d2tofks5r8bLigaJpZM4DGxC2 .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/1166#issuecomment-303149707, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdNlg--tY28kpsjw9xyrsJi8ey3g6lHks5r8bZggaJpZM4DGxC2.
Issue by bob-carpenter Wednesday Dec 10, 2014 at 18:01 GMT Originally opened as https://github.com/stan-dev/stan/issues/1166
Sean O'Riordain suggested on stan-users that it would be useful to be able to access the iteration number in a Stan program so that you could write something like this to print
theta
every 100th iteration:It would also be nice to get the chain_id in case things are running in parallel.
This brings up the issue of what the values should be when, for instance, we're running diagnostics, just evaluating the log probability function, running optimization, etc. Maybe just -1 values everywhere?
There is also the issue of which blocks this should work in. Should prints in the transformed data block get a value of iteration=0 as the value? The chain_id would still work.