Closed XiangyunHuang closed 5 years ago
下载软件压缩包
curl -fLo ./cmdstan-2.19.tar.gz https://github.com/stan-dev/cmdstan/releases/download/v2.19.0/cmdstan-2.19.0.tar.gz
解压
tar -xzf cmdstan-2.19.0.tar.gz
cd cmdstan-2.19.0
tree -L 2 .
.
├── bin
│ ├── cmdstan
│ ├── diagnose
│ ├── print
│ ├── stanc
│ └── stansummary
├── examples
│ └── bernoulli
├── Jenkinsfile
├── LICENSE
├── make
│ ├── command
│ ├── program
│ ├── stanc
│ └── tests
├── makefile
├── README.md
├── runCmdStanTests.py
├── src
│ ├── cmdstan
│ ├── docs
│ └── test
├── stan
│ ├── Jenkinsfile
│ ├── lib
│ ├── LICENSE.md
│ ├── licenses
│ ├── make
│ ├── makefile
│ ├── README.md
│ ├── RELEASE-NOTES.txt
│ ├── runTests.py
│ └── src
└── test-all.sh
14 directories, 20 files
编译
cd cmdstan-2.19.0
make build
最后显示
--- CmdStan v2.19.0 built ---
表示编译成功,中间没有任何警告和错误
伯努利分布模型: Y 服从 0-1 分布,观测数据共10个,现在需要估计其参数 theta
├── bernoulli 编译后生成的可执行文件
├── bernoulli.d
├── bernoulli.data.json 数据文件 JSON 格式
├── bernoulli.data.R 数据文件 R 代码
├── bernoulli.hpp 编译后生成的头文件
└── bernoulli.stan 模型文件
cat examples/bernoulli/bernoulli.data.json
{
"N" : 10,
"y" : [0,1,0,0,0,0,0,0,0,1]
}
cat bernoulli.data.R
N <- 10
y <- c(0,1,0,0,0,0,0,0,0,1)
cat examples/bernoulli/bernoulli.stan
# 定义数据
data {
int<lower=0> N;
int<lower=0,upper=1> y[N];
}
# 定义参数
parameters {
real<lower=0,upper=1> theta;
}
# 定义模型结构
model {
theta ~ beta(1,1);
for (n in 1:N)
y[n] ~ bernoulli(theta);
}
bernoulli.d 文件内容
head -n 10 bernoulli.d
examples/bernoulli/bernoulli.o examples/bernoulli/bernoulli: \
examples/bernoulli/bernoulli.hpp examples/bernoulli/bernoulli.hpp \
stan/src/stan/model/model_header.hpp stan/lib/stan_math/stan/math.hpp \
stan/lib/stan_math/stan/math/rev/mat.hpp \
stan/lib/stan_math/stan/math/rev/core.hpp \
stan/lib/stan_math/stan/math/rev/core/autodiffstackstorage.hpp \
stan/lib/stan_math/stan/math/memory/stack_alloc.hpp \
stan/lib/stan_math/stan/math/prim/scal/meta/likely.hpp \
stan/lib/stan_math/stan/math/rev/core/build_vari_array.hpp \
stan/lib/stan_math/stan/math/prim/mat/fun/Eigen.hpp \
...
# bernoulli.stan 文件翻译成 C++ 文件,然后编译成可执行文件 bernoulli
make examples/bernoulli/bernoulli
# 采样结果 output.csv 存放在当前目录下
examples/bernoulli/bernoulli sample data file=examples/bernoulli/bernoulli.data.R
method = sample (Default)
sample
num_samples = 1000 (Default) # 采样 1000 个
num_warmup = 1000 (Default) # 预处理 1000 个
save_warmup = 0 (Default)
thin = 1 (Default) # 采样间隔 1
adapt
engaged = 1 (Default)
gamma = 0.050000000000000003 (Default)
delta = 0.80000000000000004 (Default)
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
algorithm = hmc (Default) # HMC 算法参数
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
metric_file = (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
id = 0 (Default)
data
file = examples/bernoulli/bernoulli.data.R
init = 2 (Default)
random
seed = 53280552 # 随机数种子,用来下次重复结果
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
Gradient evaluation took 1.2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
Iteration: 100 / 2000 [ 5%] (Warmup)
Iteration: 200 / 2000 [ 10%] (Warmup)
Iteration: 300 / 2000 [ 15%] (Warmup)
Iteration: 400 / 2000 [ 20%] (Warmup)
Iteration: 500 / 2000 [ 25%] (Warmup)
Iteration: 600 / 2000 [ 30%] (Warmup)
Iteration: 700 / 2000 [ 35%] (Warmup)
Iteration: 800 / 2000 [ 40%] (Warmup)
Iteration: 900 / 2000 [ 45%] (Warmup)
Iteration: 1000 / 2000 [ 50%] (Warmup)
Iteration: 1001 / 2000 [ 50%] (Sampling)
Iteration: 1100 / 2000 [ 55%] (Sampling)
Iteration: 1200 / 2000 [ 60%] (Sampling)
Iteration: 1300 / 2000 [ 65%] (Sampling)
Iteration: 1400 / 2000 [ 70%] (Sampling)
Iteration: 1500 / 2000 [ 75%] (Sampling)
Iteration: 1600 / 2000 [ 80%] (Sampling)
Iteration: 1700 / 2000 [ 85%] (Sampling)
Iteration: 1800 / 2000 [ 90%] (Sampling)
Iteration: 1900 / 2000 [ 95%] (Sampling)
Iteration: 2000 / 2000 [100%] (Sampling)
Elapsed Time: 0.009446 seconds (Warm-up)
0.015629 seconds (Sampling)
0.025075 seconds (Total)
# output.csv 是采样过程中收集的原始数据
bin/stansummary output.csv
Inference for Stan model: bernoulli_model
1 chains: each with iter=(1000); warmup=(0); thin=(1); 1000 iterations saved.
Warmup took (0.0094) seconds, 0.0094 seconds total
Sampling took (0.016) seconds, 0.016 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -7.4 7.1e-02 9.5e-01 -9.2 -7.0 -6.8 1.8e+02 1.1e+04 1.0e+00
accept_stat__ 0.93 3.4e-03 1.1e-01 0.69 0.98 1.0 1.0e+03 6.6e+04 1.0e+00
stepsize__ 1.0 4.7e-15 3.3e-15 1.0 1.0 1.0 5.0e-01 3.2e+01 1.0e+00
treedepth__ 1.4 1.5e-02 4.9e-01 1.0 1.0 2.0 1.0e+03 6.5e+04 1.0e+00
n_leapfrog__ 2.5 4.9e-02 1.4e+00 1.0 3.0 7.0 8.0e+02 5.1e+04 1.0e+00
divergent__ 0.00 -nan 0.0e+00 0.00 0.00 0.00 -nan -nan -nan
energy__ 7.9 8.4e-02 1.2e+00 6.8 7.5 10 2.0e+02 1.3e+04 1.0e+00
theta 0.25 7.0e-03 1.3e-01 0.063 0.23 0.48 3.3e+02 2.1e+04 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
官网简介
Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.
Users specify log density functions in Stan’s probabilistic programming language and get:
写作动机: Stan 发布 2.19.0 继 支持 MPI 后终于开始 支持 GPU 关注其发展一年多了,终于到了该下笔介绍的时候了
Stan 的接口有很多,如 rstan、 pystan、 cmdstan 等共计9种,由于对 cmdstan 功能支持最为全面,因此文章会介绍这个接口的使用,而且学完这个, rstan 就比较容易了
作为入门篇,首先介绍单机 CPU 版 ,此时写作环境如下
分以下部分介绍