Open sreedta8 opened 3 years ago
Thank you for your interest in the project.
I think that you could directly use pycmdstan
to execute this command from Python (https://pycmdstan.readthedocs.io/en/latest/api.html#pycmdstan.model.stansummary_csv). Would it work for you?
Hi,
See #4 for accessing the pycmdstan original API. Then to extract the summary you can, e.g., use pandas. Looking at the shape of the pycmdstan summary, maybe something like:
import pandas as pd
import pycmdstan
cmdstan_model = pycmdstan.Model(code=str(constrained_coin.model))
run = cmdstan_model.sample(data=constrained_coin.data)
run_set = pycmdstan.model.RunSet(run)
pd.DataFrame({k:list(v[0]) for k,v in run_set.summary.items()}, index=run_set.summary['lp__'].dtype.fields).transpose()
Output:
Mean | MCSE | StdDev | 5% | 50% | 95% | N_Eff | N_Eff/s | R_hat | |
---|---|---|---|---|---|---|---|---|---|
lp__ | -7.26406 | 0.0398246 | 0.706619 | -8.8589 | -6.99614 | -6.75105 | 314.824 | 5247.06 | 1.00213 |
accept_stat__ | 0.920701 | 0.00386555 | 0.122791 | 0.669269 | 0.972133 | 1 | 1009.04 | 16817.4 | 1.00397 |
stepsize__ | 0.943827 | nan | 2.55479e-15 | 0.943827 | 0.943827 | 0.943827 | nan | nan | nan |
treedepth__ | 1.343 | 0.0178697 | 0.474949 | 1 | 1 | 2 | 706.417 | 11773.6 | 1.00135 |
n_leapfrog__ | 2.39 | 0.037044 | 0.99644 | 1 | 3 | 3 | 723.546 | 12059.1 | 1.00239 |
divergent__ | 0 | nan | 0 | 0 | 0 | 0 | nan | nan | nan |
energy__ | 7.74784 | 0.0576719 | 0.980427 | 6.79351 | 7.45342 | 9.80426 | 289.003 | 4816.72 | 1.0046 |
theta | 0.252494 | 0.00762163 | 0.120613 | 0.0840843 | 0.231581 | 0.476417 | 250.432 | 4173.86 | 1.01409 |
@mandel @gbdrt thank you both for your responses clarifying further and with examples. I'm coming from an applied statistical domain of modeling, segmentation, and quasi-experimentation with many years of experience. I'm an intermediary level programmer in Python, learning by example. I have been using rstan, pystan and pymc3 quite a bit and in one of those posts I read about yaps
which is really exciting as I do find Python to be a lot more approachable than C++ or Java.
Your examples are very helpful as I will look to develop some Bayesian modeling examples (Linear Mixed Effects, GLMs, GLMMs,and GAMs and share them back here with you so that others can use these as starters.
@gbdrt I have tried to run the model you shared and I get an error about lib.cmdstan_path("/path/to")
. Please see below (the last two lines)
Exception in thread Thread-6:
Traceback (most recent call last):
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/site-packages/pycmdstan/model.py", line 311, in _build_summary
self.niter, self._summary = stansummary_csv(run_csvs)
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/site-packages/pycmdstan/model.py", line 74, in stansummary_csv
exe = os.path.join(_find_cmdstan(), 'bin', 'stansummary')
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/site-packages/pycmdstan/model.py", line 26, in _find_cmdstan
'please provide CmdStan path, e.g. lib.cmdstan_path("/path/to/")')
pycmdstan.model.CmdStanNotFound: please provide CmdStan path, e.g. lib.cmdstan_path("/path/to/")
I have already defined a CMDSTAN variable set to cmdstan location CMDSTAN="~/cmdstan/" and can run the same model directly from cmdstan.
What is your OS? How do you launch the example?
Instead of doing simply CMDSTAN="~/cmdstan/"
, could you try to do
export CMDSTAN=$HOME/cmdstan
I'm working with Ubuntu 20.04LTS. I did it both ways.
At the terminal, I used export CMDSTAN=$HOME/cmdstan
and in the notebook I used CMDSTAN="~/cmdstan/"
Here is my entire code:
(yapyst) sz@sz-HP15-Nb:~$ export CMDSTAN=$HOME/cmdstan
(yapyst) sz@sz-HP15-Nb:~$ echo $CMDSTAN
/home/sz/cmdstan
import yaps
import pandas as pd
import pycmdstan
import numpy as np
from yaps.lib import int, real, uniform, bernoulli
from matplotlib import pyplot as plt
import logging
logging.getLogger('pycmdstan.model').setLevel(logging.CRITICAL)
@yaps.model
def coin(x: int(lower=0, upper=1)[10]):
theta: real(lower=0, upper=1) <~ uniform(0, 1)
for i in range(1,11):
x[i] <~ bernoulli(theta)
flips = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
constrained_coin = coin(x=flips)
#constrained_coin.sample(data=constrained_coin.data)
cmdstan_model = pycmdstan.Model(code=str(constrained_coin.model))
run = cmdstan_model.sample(data=constrained_coin.data)
run_set = pycmdstan.model.RunSet(run)
Error Output is as follows:
INFO:filelock:Lock 139923306816656 acquired on /home/sreezach/.cache/pycmdstan/model-796d8029.lock
INFO:filelock:Lock 139923306816656 released on /home/sreezach/.cache/pycmdstan/model-796d8029.lock
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/site-packages/pycmdstan/model.py", line 311, in _build_summary
self.niter, self._summary = stansummary_csv(run_csvs)
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/site-packages/pycmdstan/model.py", line 74, in stansummary_csv
exe = os.path.join(_find_cmdstan(), 'bin', 'stansummary')
File "/home/sz/miniconda3/envs/yapyst/lib/python3.7/site-packages/pycmdstan/model.py", line 26, in _find_cmdstan
'please provide CmdStan path, e.g. lib.cmdstan_path("/path/to/")')
pycmdstan.model.CmdStanNotFound: please provide CmdStan path, e.g. lib.cmdstan_path("/path/to/")
@sreedta8 Does the command $CMDSTAN/bin/stanc --version
works for you?
Maybe you could also try to add $CMDSTAN/bin
to your path:
export PATH=$CMDSTAN/bin:$PATH
@mandel @gbdrt
I solved the error with the following changes:
At the beginning (where the import statements are) I added
import os
then later I added the following statements before executing the model sampling
os.environ['CMDSTAN'] = "/home/sz/cmdstan"
path = os.environ.get('CMDSTAN', 'cmdstan')
flips = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
constrained_coin = coin(x=flips)
#constrained_coin.sample(data=constrained_coin.data)
cmdstan_model = pycmdstan.Model(code=str(constrained_coin.model))
run = cmdstan_model.sample(data=constrained_coin.data)
run_set = pycmdstan.model.RunSet(run)
Output is as follows without errors:
INFO:filelock:Lock 140521467452752 acquired on /home/sreezach/.cache/pycmdstan/model-796d8029.lock
INFO:filelock:Lock 140521467452752 released on /home/sreezach/.cache/pycmdstan/model-796d8029.lock
Then I used pandas
to summarize (using code provided by @gbdrt):
pd.DataFrame({k:list(v[0]) for k,v in run_set.summary.items()}, index=run_set.summary['lp__'].dtype.fields).transpose()
with this tabular output
| Mean | MCSE | StdDev | 5% | 50% | 95% | N_Eff | N_Eff/s | R_hat
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
-7.346210 | 0.031585 | 0.795877 | -8.830190 | -7.061580 | -6.753910 | 634.943 | 28861.0 | 1.004300
0.883828 | 0.005207 | 0.171573 | 0.480840 | 0.960950 | 1.000000 | 1085.830 | 49356.1 | 0.999064
1.260740 | NaN | 0.000000 | 1.260740 | 1.260740 | 1.260740 | NaN | NaN | NaN
1.447000 | 0.016882 | 0.497432 | 1.000000 | 1.000000 | 2.000000 | 868.212 | 39464.2 | 0.999198
2.214000 | 0.034800 | 0.989537 | 1.000000 | 3.000000 | 3.000000 | 808.527 | 36751.2 | 0.999004
0.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN
7.839050 | 0.045891 | 1.036690 | 6.812870 | 7.571700 | 9.768000 | 510.325 | 23196.6 | 1.003260
0.259071 | 0.006663 | 0.129156 | 0.074004 | 0.243164 | 0.485656 | 375.747 | 17079.4 | 0.999056
Thank you @sreedta8 for finding a solution!
@mandel thank you for taking the time to help with your feedback.
@sreedta8 that is a bit strange.
Is it possible that you start the jupyter server before the export CMDSTAN=...
?
Jupyter should inherit local env variables (I cannot reproduce your error).
That being said using os.environ
in the notebook, is probably more robust.
@gbdrt you surmised correctly. I did do that. I got the code about using the os.environ from another Stan development called CmdStanPy (a python wrapper for running CmdStan https://github.com/stan-dev/cmdstanpy )
Thanks again for your help - Sree
Yes CmdStanPy is great! Unfortunately, this interface requires Stan files (you cannot pass the model as a string). But, as with pycmdstan, you can also use it with YAPS by manually generating the files.
from cmdstanpy import CmdStanModel
with open("coin.stan", "w") as model_file:
model_file.write(str(constrained_coin.model))
coin_model = CmdStanModel(stan_file="coin.stan")
coin_fit = coin_model.sample(data=constrained_coin.data)
print(coin_fit.summary())
Output:
name | Mean | MCSE | StdDev | 5% | 50% | 95% | N_Eff | N_Eff/s | R_hat |
---|---|---|---|---|---|---|---|---|---|
lp__ | -7.27994 | 0.0229256 | 0.781747 | -8.86246 | -6.9734 | -6.75011 | 1162.76 | 3125.7 | 1.00471 |
theta | 0.246195 | 0.00352571 | 0.117963 | 0.0750031 | 0.234836 | 0.464874 | 1119.43 | 3009.23 | 1.0033 |
If you find this useful, we could update the YAPS wrapper to use CmdStanPy. (One issue though, is that the installation process is not as easy as with pycmdstan).
Hi
I learned about yaps for the first time today and I could install
yaps
(with its dependencies) andcmdstan
and successfully run the example model of thecoin
toss. In addition to examining the parametertheta
, I want to execute thestansummary
command on theoutput.csv
created bycmdstan
which summarizes all the parameters. Please see the desired output I get fromcmdstan