Closed rpruim closed 2 years ago
Side note: Later I'm seeing this:
mutate(mid = factor(married + 1, labels = c("single", "married")))
Unless I'm misunderstanding something, that +1
doesn't do anything. If you omit it, you will get the same factor.
I'm also confused by the name mid
. That sounds like married ID, which I would expect to be the 0/1 variable, not the nicely named factor.
It might be a day or two before I can study this in detail, but thanks!
Just for completeness, here is the original code from {rethinking} for comparison.
sim_happiness <-
function (seed = 1977, N_years = 1000, max_age = 65, N_births = 20,
aom = 18)
{
set.seed(seed)
H <- M <- A <- c()
for (t in 1:N_years) {
A <- A + 1
A <- c(A, rep(1, N_births))
H <- c(H, seq(from = -2, to = 2, length.out = N_births))
M <- c(M, rep(0, N_births))
for (i in 1:length(A)) {
if (A[i] >= aom & M[i] == 0) {
M[i] <- rbern(1, inv_logit(H[i] - 4))
}
}
deaths <- which(A > max_age)
if (length(deaths) > 0) {
A <- A[-deaths]
H <- H[-deaths]
M <- M[-deaths]
}
}
d <- data.frame(age = A, married = M, happiness = H)
return(d)
}
@rpruim, this works like a dream.
Ah, I see now. McElreath's mid
variable is an index variable. The reason he added 1 (married + 1
) is because he wanted the first category to be 1
and the second to be 2
. That would be more obvious if I hadn't made it a factor. It's fine to keep the mid
variable as numeric, the way McElreath did in the text. I made it a factor because it helps make the brms output more interpretable.
As to the simulation, you can find the draft of my implementation here.
Looks good. BTW, I was tempted to use 21 newborns rather than 20 as that would make the happiness values nicer to look at since the gaps would be 1/20 of the distance from -2 to 2 instead of 1/19. In the end, I left things to match what was done in the text. But generally, when I make grids, I use some_round_number + 1
rather than some_round_number
.
One nit: the update_population()
function takes as input a population, which may or may not consist only of newborns. The only time it is applied to a population of newborns is in the second year. It's important that the function can process an arbitrary population.
But overall, the explanation is very nice.
I looked to see if I could find a function iterator, and the only one I found quickly was in a now archived package. A function iterator would avoid the need for a visible for loop and look something like
interate(update_population, 999, newborns(20))
perhaps with some additional arguments. That makes it even clearer what the big idea is here.
One could also use recursion, but the nesting gets pretty deep, so it isn't ideal here.
My guess is there is some tricky way to replace the for loop with something from the purrr::map()
family. I just haven't figured it out.
This could be done with purrr::reduce()
- one might need to include a dummy parameter (year
) to iterate over in the updating function though.
eg:
progress_year <- function(data, year, max_age = 65, n_births = 20, aom = 18){
new_cohort <- tibble(age = rep(1, n_births),
married = as.integer(rep(0, n_births)),
happiness = seq(from = -2, to = 2, length.out = n_births))
data %>%
mutate(age = age + 1) %>%
bind_rows(., new_cohort) %>%
mutate(married = if_else(age >= aom & married == 0,
rbern(n(), inv_logit(happiness - 4)),
married )) %>%
filter(age <= max_age)
}
sim_tidy <- function(seed = 1977, n_years = 1000, max_age = 65, n_births = 20, aom = 18){
set.seed(seed)
empty_tibble <- tibble(age = double(),
married = integer(),
happiness = double())
1:n_years %>%
reduce(.f = progress_year,
.init = empty_tibble,
max_age = max_age, n_births = n_births, aom = aom)
}
sim_tidy(seed = 1977, n_years = 66)
btw. I have no clue why one would want to iterate this for n > (max_age + 1)
, since all the agents leave the simulation after they reach max_age
anyways... :thinking:
It'll be a week or few before I have the bandwidth to look at this in detail. In the meantime, thanks for the heads up!
I still don't fully understand your simulation code, yet, but wow is it blazing fast. Thanks for the tip on purrr::reduce()
. It seems like a handy tool to have in the bag. Looks like I have some homework to do...
@k-hench, the thing I'm struggling with in your code is what the year
argument is doing within the progress_year()
. When I take it out, it causes problems, but I can't see where it's receiving input.
Regarding
btw. I have no clue why one would want to iterate this for n > (max_age + 1) , since all the agents leave the simulation after they reach max_age anyways... 🤔
While individual leave the population at age max_age
, the population structure when they leave may be different from the structure when they were born, so simulating longer does make sense, I think.
Regarding
@k-hench, the thing I'm struggling with in your code is what the year argument is doing within the progress_year(). When I take it out, it causes problems, but I can't see where it's receiving input.
The function provided to reduce()
must take two arguments:
For reduce(), and accumulate(), a 2-argument function. The function will be passed the accumulated value as the first argument and the "next" value as the second argument.
In this case, we don't do anything with the "next" value, but it will be passed in, so the function must accept it.
BTW: You can see both of my last two comments in effect if you insert the following line right after the creation of new_cohort
in progress_year()
:
print(data %>% summarise(year = year, n = n(), n_married = sum(married)) %>% as.data.frame())
After 65 years, the population stabilizes at 20 people per age, but the number who are married keeps changing because of the randomness involved. Things would change even more if happiness were assigned randomly instead of evenly spaced across the range or people were removed randomly rather than automatically at age 65.
It would also be possible for the simulation to depend on the year in some way. It just happens that this simulation does not.
regarding:
The function provided to reduce() must take two arguments
Yes, exactly. year
is really just a dummy to conform to the function structure that reduce()
expects.
regarding
yet, but wow is it blazing fast
unfortunately, it is not :unamused: : I might have tricked you here by just simulating 65 generations instead of 1000.
Directly comparing (separate gist) my approach with @rpruim 's , I have to admit that reduce()
seems somewhat slower than the for()
loop (~ 0.55 sec vs. ~ 0.38 sec).
Considering how close these are, I guess that the difference is likely rather in the implementation of the simulations than in reduce()
vs. for()
iteration :man_shrugging:
Regarding
btw. I have no clue why one would want to iterate this for n > (max_age + 1) , since all the agents leave the simulation after they reach max_age anyways... thinking
While individual leave the population at age
max_age
, the population structure when they leave may be different from the structure when they were born, so simulating longer does make sense, I think.
hmm - I'm not convinced :thinking: I might just be blanking here, but I don't seed how the population structure effects the individual chances for marriage:
As I understand the simulation, for each individual there are simply 65 (-18) successive chances to get married (with a chance depending on their happiness
).
While the other people in population do effect the the randomness in the sense that they progress the random number generator, they do not influence the individual marriage chances functionally.
So, I believe that the 1000-65 extra generations are just an elaborate (and expensive) way to update the seed of the random number generation for the final 65 generations...
(but that this is not different from just using a different seed for set.seed()
in any biologically relevant way :man_shrugging: )
ere, but I don't seed how the population structure effects the individual chances for marriage:
Ah, you're right. I was tricked. Since there' no speed advantage and given I understand @rpruim's approach much better, I think I'll go with his in the next update. Your reduce()
method still seems too confusing.
@rpruim's solution is now integrated into the main file.
You said:
Challenge accepted. There is a fundamental looping over time, but we can separate out the looping from the annual updates pretty easily. Here's one way to do it.