ASKurz / Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse_2_ed

The bookdown version lives here:
https://bookdown.org/content/4857/
Creative Commons Zero v1.0 Universal
125 stars 37 forks source link

replacement for sim_happieness #26

Closed rpruim closed 2 years ago

rpruim commented 3 years ago

You said:

Quite frankly, I can't make sense of it. So we'll just have to move forward and use the convenience function rather than practicing a tidyverse alternative. If you have a handle on what's going on and have a tidyverse alternative, please share your code.

Challenge accepted. There is a fundamental looping over time, but we can separate out the looping from the annual updates pretty easily. Here's one way to do it.

# Return a data frame with n 1-year-olds of varying happiness
new_borns <- function(n = 20) {
  tibble(
    A = 1,   # 1 year old
    M = 0,   # not married
    H = seq(-2, 2, length.out = n)   # range of happiness scores
  )
}

# Everyone ages;
# Some unmarried adults get married (happier people are more likely to)
# Old people die;
# New people are born.

update_population <- function(
  pop, N_births = 20, aom = 18, max_age = 65) {

  pop %>%
    mutate(
      A = A + 1, # everyone gets one year older
      # some people get married
      M = ifelse(M >= 1, 1, (A >= aom) * rbinom(nrow(pop), 1, inv_logit(H - 4)))
    ) %>%
    filter(A <= max_age) %>%    # old people die
    bind_rows(
      new_borns(N_births)            # new people are born
    )
}

# Run the population simulation for 1000 years.
set.seed(1997)
Pop <- new_borns(20)          # year 1
for(i in 2:1000) {            # years 2 through 1000
  Pop <- update_population(Pop, aom = 18, max_age = 65, N_births = 20)
} 
rpruim commented 3 years ago

Side note: Later I'm seeing this:

mutate(mid = factor(married + 1, labels = c("single", "married")))

Unless I'm misunderstanding something, that +1 doesn't do anything. If you omit it, you will get the same factor.

I'm also confused by the name mid. That sounds like married ID, which I would expect to be the 0/1 variable, not the nicely named factor.

ASKurz commented 3 years ago

It might be a day or two before I can study this in detail, but thanks!

rpruim commented 3 years ago

Just for completeness, here is the original code from {rethinking} for comparison.

sim_happiness <-
  function (seed = 1977, N_years = 1000, max_age = 65, N_births = 20, 
                  aom = 18) 
{
    set.seed(seed)
    H <- M <- A <- c()
    for (t in 1:N_years) {
        A <- A + 1
        A <- c(A, rep(1, N_births))
        H <- c(H, seq(from = -2, to = 2, length.out = N_births))
        M <- c(M, rep(0, N_births))
        for (i in 1:length(A)) {
            if (A[i] >= aom & M[i] == 0) {
                M[i] <- rbern(1, inv_logit(H[i] - 4))
            }
        }
        deaths <- which(A > max_age)
        if (length(deaths) > 0) {
            A <- A[-deaths]
            H <- H[-deaths]
            M <- M[-deaths]
        }
    }
    d <- data.frame(age = A, married = M, happiness = H)
    return(d)
}
ASKurz commented 3 years ago

@rpruim, this works like a dream.

ASKurz commented 3 years ago

Ah, I see now. McElreath's mid variable is an index variable. The reason he added 1 (married + 1) is because he wanted the first category to be 1 and the second to be 2. That would be more obvious if I hadn't made it a factor. It's fine to keep the mid variable as numeric, the way McElreath did in the text. I made it a factor because it helps make the brms output more interpretable.

ASKurz commented 3 years ago

As to the simulation, you can find the draft of my implementation here.

rpruim commented 3 years ago

Looks good. BTW, I was tempted to use 21 newborns rather than 20 as that would make the happiness values nicer to look at since the gaps would be 1/20 of the distance from -2 to 2 instead of 1/19. In the end, I left things to match what was done in the text. But generally, when I make grids, I use some_round_number + 1 rather than some_round_number.

One nit: the update_population() function takes as input a population, which may or may not consist only of newborns. The only time it is applied to a population of newborns is in the second year. It's important that the function can process an arbitrary population.

But overall, the explanation is very nice.

I looked to see if I could find a function iterator, and the only one I found quickly was in a now archived package. A function iterator would avoid the need for a visible for loop and look something like

interate(update_population, 999, newborns(20))

perhaps with some additional arguments. That makes it even clearer what the big idea is here.

One could also use recursion, but the nesting gets pretty deep, so it isn't ideal here.

ASKurz commented 3 years ago

My guess is there is some tricky way to replace the for loop with something from the purrr::map() family. I just haven't figured it out.

k-hench commented 2 years ago

This could be done with purrr::reduce() - one might need to include a dummy parameter (year) to iterate over in the updating function though.

eg:

progress_year <- function(data, year, max_age = 65, n_births = 20, aom = 18){
  new_cohort <- tibble(age = rep(1, n_births),
       married = as.integer(rep(0, n_births)),
       happiness = seq(from = -2, to = 2, length.out = n_births))

  data %>% 
    mutate(age = age + 1) %>% 
    bind_rows(., new_cohort) %>% 
    mutate(married = if_else(age >= aom & married == 0,
                             rbern(n(), inv_logit(happiness - 4)),
                             married )) %>% 
    filter(age <= max_age)
}

sim_tidy <- function(seed = 1977, n_years = 1000, max_age = 65, n_births = 20, aom = 18){
  set.seed(seed)

  empty_tibble <- tibble(age = double(),
       married = integer(),
       happiness = double())

  1:n_years %>% 
    reduce(.f = progress_year,
           .init = empty_tibble,
           max_age = max_age, n_births = n_births, aom = aom)
}

sim_tidy(seed = 1977, n_years = 66)

btw. I have no clue why one would want to iterate this for n > (max_age + 1), since all the agents leave the simulation after they reach max_age anyways... :thinking:

ASKurz commented 2 years ago

It'll be a week or few before I have the bandwidth to look at this in detail. In the meantime, thanks for the heads up!

ASKurz commented 2 years ago

I still don't fully understand your simulation code, yet, but wow is it blazing fast. Thanks for the tip on purrr::reduce(). It seems like a handy tool to have in the bag. Looks like I have some homework to do...

ASKurz commented 2 years ago

@k-hench, the thing I'm struggling with in your code is what the year argument is doing within the progress_year(). When I take it out, it causes problems, but I can't see where it's receiving input.

rpruim commented 2 years ago

Regarding

btw. I have no clue why one would want to iterate this for n > (max_age + 1) , since all the agents leave the simulation after they reach max_age anyways... 🤔

While individual leave the population at age max_age, the population structure when they leave may be different from the structure when they were born, so simulating longer does make sense, I think.

rpruim commented 2 years ago

Regarding

@k-hench, the thing I'm struggling with in your code is what the year argument is doing within the progress_year(). When I take it out, it causes problems, but I can't see where it's receiving input.

The function provided to reduce() must take two arguments:

For reduce(), and accumulate(), a 2-argument function. The function will be passed the accumulated value as the first argument and the "next" value as the second argument.

In this case, we don't do anything with the "next" value, but it will be passed in, so the function must accept it.

rpruim commented 2 years ago

BTW: You can see both of my last two comments in effect if you insert the following line right after the creation of new_cohort in progress_year():

  print(data %>% summarise(year = year, n = n(), n_married = sum(married)) %>% as.data.frame()) 

After 65 years, the population stabilizes at 20 people per age, but the number who are married keeps changing because of the randomness involved. Things would change even more if happiness were assigned randomly instead of evenly spaced across the range or people were removed randomly rather than automatically at age 65.

It would also be possible for the simulation to depend on the year in some way. It just happens that this simulation does not.

k-hench commented 2 years ago

regarding:

The function provided to reduce() must take two arguments

Yes, exactly. year is really just a dummy to conform to the function structure that reduce() expects.

k-hench commented 2 years ago

regarding

yet, but wow is it blazing fast

unfortunately, it is not :unamused: : I might have tricked you here by just simulating 65 generations instead of 1000.

Directly comparing (separate gist) my approach with @rpruim 's , I have to admit that reduce() seems somewhat slower than the for() loop (~ 0.55 sec vs. ~ 0.38 sec). Considering how close these are, I guess that the difference is likely rather in the implementation of the simulations than in reduce() vs. for() iteration :man_shrugging:

k-hench commented 2 years ago

Regarding

btw. I have no clue why one would want to iterate this for n > (max_age + 1) , since all the agents leave the simulation after they reach max_age anyways... thinking

While individual leave the population at age max_age, the population structure when they leave may be different from the structure when they were born, so simulating longer does make sense, I think.

hmm - I'm not convinced :thinking: I might just be blanking here, but I don't seed how the population structure effects the individual chances for marriage:

As I understand the simulation, for each individual there are simply 65 (-18) successive chances to get married (with a chance depending on their happiness). While the other people in population do effect the the randomness in the sense that they progress the random number generator, they do not influence the individual marriage chances functionally.

So, I believe that the 1000-65 extra generations are just an elaborate (and expensive) way to update the seed of the random number generation for the final 65 generations... (but that this is not different from just using a different seed for set.seed() in any biologically relevant way :man_shrugging: )

ASKurz commented 2 years ago

ere, but I don't seed how the population structure effects the individual chances for marriage:

Ah, you're right. I was tricked. Since there' no speed advantage and given I understand @rpruim's approach much better, I think I'll go with his in the next update. Your reduce() method still seems too confusing.

ASKurz commented 2 years ago

@rpruim's solution is now integrated into the main file.