Closed OmidGhasemi21 closed 3 years ago
Nice! Thanks for sharing your code. If you ever adapt your method for the work underlying Figure 7.6, I’d love to see it.
Hi Solomon,
Thanks for your feedback.
In the book, you asked to share our codes for reproducing Kruschke's ESS and MCSE. I edited my previous post and added the following lines which reproduce those measures:
# Calculate effectiveSize and MCSE: in the book ESS= 605.8 and MCSE = .00254
require(coda)
all_chains %>%
filter(iter %in% (500:10000)) %>%
summarise(eff_size = effectiveSize(accepted_traj),
mcse= sd(accepted_traj) / sqrt(eff_size))
Moreover, I tried to reproduce Fig. 7.6. of the book.
Two important notes:
Kruschke has used a bivariate normal proposal distribution. I used a method similar to this book: https://rpruim.github.io/Kruschke-Notes/markov-chain-monte-carlo-mcmc.html#two-coins
This example is really beyond my knowledge of R and Bayesian stats, so I just copy my codes here to get some feedback. It is possible that I misunderstood the 2D Metropolis algorithm. I tried to check the effective size and the proportion on accepted steps and they were similar to Kruschke's results, however, the figure looks a little bit odd.
here are my codes with proposal SD = .2:
# 2 parameters Bayes' Rule with Metropolis algorithm
# Our theta sequences and data
theta_sequence <- seq(from = 0, to = 1, by = .01)
theta_1_data <- rep(0:1, times = c(8 - 6, 6))
theta_2_data <- rep(0:1, times = c(7 - 2, 2))
likelihood <- function(theta1, data1, theta2, data2) {
z1 <- sum(data1)
n1 <- length(data1)
z2 <- sum(data2)
n2 <- length(data2)
p_data_given_theta <- (theta1^z1 * (1 - theta1)^(n1 - z1)) * (theta2^z2 * (1 - theta2)^(n2 - z2))
p_data_given_theta[theta1 > 1 | theta1 < 0] <- 0
p_data_given_theta[theta2 > 1 | theta2 < 0] <- 0
return(p_data_given_theta)
}
# define the prior density function.
prior_d <- function(theta1, theta2) {
p_theta <- dbeta(theta1, 1, 1) * dbeta(theta2, 1, 1)
p_theta[theta1 > 1 | theta1 < 0] = 0
p_theta[theta2 > 1 | theta2 < 0] = 0
return(p_theta)
}
# Define the relative probability of the target distribution. For our application,
# this target distribution is the unnormalized posterior distribution.
target_rel_prob <- function(theta1, data1, theta2, data2) {
target_rel_prob <- (likelihood(theta1, data1, theta2, data2) * prior_d(theta1, theta2))
return(target_rel_prob)
}
# specify the length of the trajectory, i.e., the number of jumps to try:
traj_length <- 50000
# initialize the vector that will store the results
trajectory1 <- rep(0, traj_length)
trajectory2 <- rep(0, traj_length)
# specify where to start the trajectory:
trajectory1[1] <- 0.5 # another arbitrary value
trajectory2[1] <- 0.5 # another arbitrary value
# specify the burn-in period
burn_in <- ceiling(0.0 * traj_length) # arbitrary number, less than `traj_length`
# initialize accepted, rejected counters, just to monitor performance:
n_accepted <- 0
n_rejected <- 0
proposal_sd <- .2
for (t in 1:(traj_length - 1)) {
current_position1 <- trajectory1[t]
current_position2 <- trajectory2[t]
# use the proposal distribution to generate a proposed jump
proposed_jump1 <- rnorm(1, mean = 0, sd = proposal_sd)
proposed_jump2 <- rnorm(1, mean = 0, sd = proposal_sd)
# compute the probability of accepting the proposed jump
prob_accept <- min(1,
target_rel_prob(current_position1 + proposed_jump1, theta_1_data,
current_position2 + proposed_jump2, theta_2_data)
/ target_rel_prob(current_position1, theta_1_data, current_position2, theta_2_data))
# generate a random uniform value from the interval [0, 1] to
# decide whether or not to accept the proposed jump
if (runif(1) < prob_accept) {
# accept the proposed jump
trajectory1[t + 1] <- current_position1 + proposed_jump1
trajectory2[t + 1] <- current_position2 + proposed_jump2
# increment the accepted counter, just to monitor performance
if (t > burn_in) {n_accepted <- n_accepted + 1}
} else {
# reject the proposed jump, stay at current position
trajectory1[t + 1] <- current_position1
trajectory2[t + 1] <- current_position2
# increment the rejected counter, just to monitor performance
if (t > burn_in) {n_rejected <- n_rejected + 1}
}
}
# extract the post-burn_in portion of the trajectory
accepted_traj1 <- trajectory1[(burn_in + 1) : length(trajectory1)]
accepted_traj2 <- trajectory2[(burn_in + 1) : length(trajectory2)]
metrop_2d_data <- tibble(accepted_traj1 = accepted_traj1,
accepted_traj2 = accepted_traj2,
n_accepted = n_accepted,
n_rejected = n_rejected,
iter = rep(1:traj_length))
# calculate the proportion of accepted proposal compared to all proposals
metrop_2d_data %>%
summarise(acc_diveded_total_pro= n_accepted/length(accepted_traj))
# Calculate effectiveSize
require(coda)
metrop_2d_data %>%
summarise(eff_size1 = effectiveSize(accepted_traj1),
eff_size2 = effectiveSize(accepted_traj2))
metrop_2d_traceplot <- metrop_2d_data %>%
filter(iter < 1000) %>%
ggplot(aes(x = accepted_traj1, y = accepted_traj2)) +
geom_path(size = 1/4, color = "skyblue") +
geom_point(size = 1, alpha = 1/2, color = "skyblue", shape = 1) +
coord_cartesian(xlim = c(0,1), ylim = c(0,1)) +
labs(title = "", x= expression(theta[1]), y= expression(theta[2])) +
theme_bw()
Hey @OmidGhasemi21, I'm finally getting back to this. Similar to how you mention, above, this is stretching the limits of my programing skills. But based on my understanding of the task, you work flow gets the job done and I'm going to add this, with attribution to you, to the next revision of the book in a slightly revised form.
My first step will be to define a few of your custom functions:
# 2 parameters Bayes' Rule with Metropolis algorithm
# Our theta sequences and data
theta_sequence <- seq(from = 0, to = 1, by = .01)
theta_1_data <- rep(0:1, times = c(8 - 6, 6))
theta_2_data <- rep(0:1, times = c(7 - 2, 2))
# define the bivariate Bernoulli likelihood
bivariate_bernoulli_likelihood <- function(theta1, data1, theta2, data2) {
z1 <- sum(data1)
n1 <- length(data1)
z2 <- sum(data2)
n2 <- length(data2)
p_data_given_theta <- (theta1^z1 * (1 - theta1)^(n1 - z1)) * (theta2^z2 * (1 - theta2)^(n2 - z2))
p_data_given_theta[theta1 > 1 | theta1 < 0] <- 0
p_data_given_theta[theta2 > 1 | theta2 < 0] <- 0
return(p_data_given_theta)
}
# define the prior density function
prior_d <- function(theta1, theta2) {
p_theta <- dbeta(theta1, 1, 1) * dbeta(theta2, 1, 1)
p_theta[theta1 > 1 | theta1 < 0] = 0
p_theta[theta2 > 1 | theta2 < 0] = 0
return(p_theta)
}
# Define the relative probability of the target distribution. For our application,
# this target distribution is the unnormalized posterior distribution.
target_rel_prob <- function(theta1, data1, theta2, data2) {
target_rel_prob <- (bivariate_bernoulli_likelihood(theta1, data1, theta2, data2) * prior_d(theta1, theta2))
return(target_rel_prob)
}
Second, I'll wrap the bulk of the remaining code into a single function called my_bivariate_metropolis()
.
my_bivariate_metropolis <- function(proposal_sd = .02,
# specify the length of the trajectory (i.e., the number of jumps to try)
traj_length = 50000) {
# initialize the vector that will store the results
trajectory1 <- rep(0, traj_length)
trajectory2 <- rep(0, traj_length)
# specify where to start the trajectory:
trajectory1[1] <- 0.5 # another arbitrary value
trajectory2[1] <- 0.5 # another arbitrary value
# specify the burn-in period
burn_in <- ceiling(0.0 * traj_length) # arbitrary number, less than `traj_length`
# initialize accepted, rejected counters, just to monitor performance:
n_accepted <- 0
n_rejected <- 0
for (t in 1:(traj_length - 1)) {
current_position1 <- trajectory1[t]
current_position2 <- trajectory2[t]
# use the proposal distribution to generate a proposed jump
proposed_jump1 <- rnorm(1, mean = 0, sd = proposal_sd)
proposed_jump2 <- rnorm(1, mean = 0, sd = proposal_sd)
# compute the probability of accepting the proposed jump
prob_accept <- min(1,
target_rel_prob(current_position1 + proposed_jump1, theta_1_data,
current_position2 + proposed_jump2, theta_2_data)
/ target_rel_prob(current_position1, theta_1_data, current_position2, theta_2_data))
# generate a random uniform value from the interval [0, 1] to
# decide whether or not to accept the proposed jump
if (runif(1) < prob_accept) {
# accept the proposed jump
trajectory1[t + 1] <- current_position1 + proposed_jump1
trajectory2[t + 1] <- current_position2 + proposed_jump2
# increment the accepted counter, just to monitor performance
if (t > burn_in) {n_accepted <- n_accepted + 1}
} else {
# reject the proposed jump, stay at current position
trajectory1[t + 1] <- current_position1
trajectory2[t + 1] <- current_position2
# increment the rejected counter, just to monitor performance
if (t > burn_in) {n_rejected <- n_rejected + 1}
}
}
# extract the post-burn_in portion of the trajectory
accepted_traj1 <- trajectory1[(burn_in + 1) : length(trajectory1)]
accepted_traj2 <- trajectory2[(burn_in + 1) : length(trajectory2)]
# collect the results
metrop_2d_data <-
tibble(iter = rep(1:traj_length),
accepted_traj1 = accepted_traj1,
accepted_traj2 = accepted_traj2,
n_accepted = n_accepted,
n_rejected = n_rejected)
return(metrop_2d_data)
}
Third, I'll run the simulation within a nested tibble.
mh <-
tibble(proposal_sd = c(0.02, 0.2)) %>%
mutate(mh = map(proposal_sd, my_bivariate_metropolis)) %>%
unnest(mh)
Fourth, I'll plot like this:
mh %>%
filter(iter < 1000) %>%
ggplot(aes(x = accepted_traj1, y = accepted_traj2)) +
geom_path(size = 1/10, alpha = 1/2, color = "steelblue") +
geom_point(alpha = 1/4, color = "steelblue") +
scale_x_continuous(expression(theta[1]), breaks = 0:5 / 5, expand = c(0, 0), limits = c(0, 1)) +
scale_y_continuous(expression(theta[2]), breaks = 0:5 / 5, expand = c(0, 0), limits = c(0, 1)) +
coord_equal() +
theme_cowplot() +
panel_border() +
theme(panel.spacing = unit(0.75, "cm")) +
facet_wrap(~ proposal_sd, labeller = label_both)
Fifth, I'll summarize the results like this:
# calculate the proportion of accepted proposal compared to all proposals
mh %>%
group_by(proposal_sd) %>%
slice(1) %>%
summarise(acceptance_rate = n_accepted / (n_accepted + n_rejected))
# calculate ess
mh %>%
group_by(proposal_sd) %>%
summarise(ess_theta_1 = effectiveSize(accepted_traj1),
ess_theta_2 = effectiveSize(accepted_traj2))
proposal_sd acceptance_rate
<dbl> <dbl>
1 0.02 0.930
2 0.2 0.425
proposal_sd ess_theta_1 ess_theta_2
<dbl> <dbl> <dbl>
1 0.02 243. 212.
2 0.2 6551. 5809.
When I thank you in the book, should I refer to you as Omid Ghasemi?
Hi @ASKurz, thank you so much for checking the code and improving it significantly. The plot looks really nice now. Yes, "Omid Ghasemi" is fine.
Finally, thank you for this amazing job. Learning Kruschke's book is now much easier and more fun with the help of your fantastic book. Well done!
Hi,
I slightly changed your codes for the Metropolis single-chain algorithm and reproduced several plots of figures 7.10 and 7.11 (page 179-180). In the bookdown, you have used
brms
to build the models and check for representativeness, so I am not sure if the Metropolis algorithm is useful to be added. Anyway, here it is:To draw a Gelman plot to check shrinkage factors, we can use the coda package. However, the graph is not as beautiful as the previous ones.