EpiModel / ARTnet

Network and Epidemic Parameterization with the ARTnet Study Data
10 stars 5 forks source link

performance in `predict` #31

Closed AdrienLeGuillou closed 4 years ago

AdrienLeGuillou commented 4 years ago

When running profviz on netsim for my SFO project, I realized that a non negligible amount of time was spent on the predict functions for mod.cond and mod.acts (condoms and acts models respectively). Especially, the model.frame call, that transform the variables (factors to dummy variables, I(), etc) is taking 3.7 seconds out of the 25 total.

I mentioned this to @smjenness recently and would like to open a discussion regarding these facts.

Below are toy benchmarks using the mtcars data-set to compare predict perf with the modifications done outside or by predict

data(mtcars)

mtcars$y <- mtcars$wt < 2.4
# cyl as logical
mtcars$cyl6 <- mtcars$cyl == 6
mtcars$cyl8 <- mtcars$cyl == 8
# cyl as number 0 or 1
mtcars$cyl6_n <- as.numeric(mtcars$cyl == 6)
mtcars$cyl8_n <- as.numeric(mtcars$cyl == 8)
# cyl as character
mtcars$cyl <- as.character(mtcars$cyl)
# gear as logical
mtcars$gear4 <- mtcars$gear == 4
mtcars$gear5 <- mtcars$gear == 5
# gear as number 0 or 1
mtcars$gear4_n <- as.numeric(mtcars$gear == 4)
mtcars$gear5_n <- as.numeric(mtcars$gear == 5)
# gear as character
mtcars$gear <- as.character(mtcars$gear)

mod1 <- glm(y ~ cyl + gear + mpg, data = mtcars)
mod2 <- glm(y ~ cyl6 + cyl8 + gear4 + gear5 + mpg, data = mtcars)
mod3 <- glm(y ~ cyl6_n + cyl8_n +  gear4_n + gear5_n + mpg, data = mtcars)

x1 <- mtcars[, c("cyl", "gear", "mpg")]
x2 <- mtcars[, c("cyl6", "cyl8", "gear4", "gear5", "mpg")]
x3 <- mtcars[, c("cyl6_n", "cyl8_n",  "gear4_n", "gear5_n", "mpg")]

microbenchmark::microbenchmark(
  predict(mod1, newdata = x1, type = "response"),
  predict(mod2, newdata = x2, type = "response"),
  predict(mod3, newdata = x3, type = "response"),
  predict(mod1, newdata = mtcars, type = "response"),
  predict(mod2, newdata = mtcars, type = "response"),
  predict(mod3, newdata = mtcars, type = "response")
)

# edited output
> Unit: microseconds
> expr                                                mean  median 
> predict(mod1, newdata = x1, type = "response")      889    877    
> predict(mod2, newdata = x2, type = "response")      709    685    
> predict(mod3, newdata = x3, type = "response")      369    356    
> predict(mod1, newdata = mtcars, type = "response")  884    861    
> predict(mod2, newdata = mtcars, type = "response")  716    693    
> predict(mod3, newdata = mtcars, type = "response")  374    364    

It seems that by storing and producing the data in a "friendlier" format (i.e. only numbers) we can roughly double the speed of predict. It's hard to assess what gain it would produce in the model itself but I would guess between 10% and 30% reduction in runtime.

However, changing ARTnet and EpiModelHIV for this purpose would take a long time as the models uses a combination of factors and powers (with I(x^2)). We would have to assess how to store and produce these values without taking more time doing so than what we will gain.

smjenness commented 4 years ago

Can you show the profviz profile for a single time step? I'm not sure what number of timesteps the 3.7 seconds and 25 seconds correspond to. Because the differences in the microbenchmark are all less than 1 ms (which is negligible for EpiModel run times), it'd be interested in seeing the benchmark on the actual ARTnet data prediction. Thanks!

AdrienLeGuillou commented 4 years ago

I was running the model for 2 years (104 steps). Below is the profviz for 1 time step (the results are less stable due to when the garbage collector is called for example.

I'll try to do the same with ARTnet data

image

AdrienLeGuillou commented 4 years ago

I ran the act model from build_epistat with the following elements:

geog.lvl = "city"         
geog.cat = "San Francisco"
race = TRUE               

here is the modification I tried:

acts.mod <- glm(floor(acts*52) ~ duration + I(duration^2) + as.factor(race.combo) +
                  as.factor(ptype) + duration*as.factor(ptype) + comb.age + I(comb.age^2) +
                  hiv.concord.pos,
                family = poisson(), data = la)

la2 <- la
la2$duration2 <- la$duration^2
la2$race.combo2 <- as.numeric(la$race.combo == 2)
la2$race.combo3 <- as.numeric(la$race.combo == 3)
la2$race.combo4 <- as.numeric(la$race.combo == 4)
la2$race.combo5 <- as.numeric(la$race.combo == 5)
la2$race.combo6 <- as.numeric(la$race.combo == 6)
la2$ptype2 <- as.numeric(la$ptype == 2)
la2$comb.age2 <- la$comb.age^2

acts.mod2 <- glm(
  floor(acts*52) ~ duration + duration2 +
    race.combo2 + race.combo3 + race.combo4 + race.combo5 + race.combo6 +
    ptype2 + duration * ptype2 +
    comb.age + comb.age2 + hiv.concord.pos,
  family = poisson(), data = la2)

microbenchmark::microbenchmark(
  original = predict.glm(acts.mod, newdata = la, type = "response"),
  other = predict.glm(acts.mod2, newdata = la2, type = "response")
)

> Unit: microseconds
>     expr      min        lq      mean   median    uq        max    neval
> original    8261    8808    9290    9064     9379    13654   100
>    other      666       751     882       790       879      5006   100

so for this model we get a tenfold increase (but in microseconds)

because the changes to the data.frame must be done anyway I benchmarked the different element:

library(data.table)

dt3 <- as.data.table(l_race_cat)
dt4 <- as.data.table(l_race_cat)

microbenchmark::microbenchmark(
  orig = {
    l2 <- l_race_cat
    l2$race.combo <- rep(NA, nrow(l2))
    l2$race.combo[l2$race.cat3 == 1 & l2$p_race.cat3 == 1] <- 1
    l2$race.combo[l2$race.cat3 == 1 & l2$p_race.cat3 %in% 2:3] <- 2
    l2$race.combo[l2$race.cat3 == 2 & l2$p_race.cat3 %in% c(1, 3)] <- 3
    l2$race.combo[l2$race.cat3 == 2 & l2$p_race.cat3 == 2] <- 4
    l2$race.combo[l2$race.cat3 == 3 & l2$p_race.cat3 %in% 1:2] <- 5
    l2$race.combo[l2$race.cat3 == 3 & l2$p_race.cat3 == 3] <- 6
  },
  other = {
    l3 <- l_race_cat
    l3$race.combo2 <- ifelse(l3$race.cat3 == 1 & l3$p_race.cat3 %in% 2:3, 1, 0)
    l3$race.combo3 <- ifelse(l3$race.cat3 == 2 & l3$p_race.cat3 %in% c(1, 3), 1, 0)
    l3$race.combo4 <- ifelse(l3$race.cat3 == 2 & l3$p_race.cat3 == 2, 1, 0)
    l3$race.combo5 <- ifelse(l3$race.cat3 == 3 & l3$p_race.cat3 %in% 1:2, 1, 0)
    l3$race.combo6 <- ifelse(l3$race.cat3 == 3 & l3$p_race.cat3 == 3, 1, 0)
  },
  dt_orig = dt3[, race.combo := fifelse(
              race.cat3 == 1 & p_race.cat3 == 1, 1,
              fifelse(race.cat3 == 1 & p_race.cat3 %in% 2:3, 2,
              fifelse(race.cat3 == 2 & p_race.cat3 %in% c(1, 3), 3,
              fifelse(race.cat3 == 2 & p_race.cat3 == 2, 4,
                      fifelse(race.cat3 == 3 & p_race.cat3 %in% 1:2, 5, 6)))))],
  dt_other = dt4[, c("race.combo2", "race.combo3", "race.combo4",
        "race.combo5", "race.combo6") := .(
                     fifelse(race.cat3 == 1 & p_race.cat3 %in% 2:3, 1, 0),
                     fifelse(race.cat3 == 2 & p_race.cat3 %in% c(1, 3), 1, 0),
                     fifelse(race.cat3 == 2 & p_race.cat3 == 2, 1, 0),
                     fifelse(race.cat3 == 3 & p_race.cat3 %in% 1:2, 1, 0),
                     fifelse(race.cat3 == 3 & p_race.cat3 == 3, 1, 0)
                   )]
)

> Unit: milliseconds
>     expr     min     lq     mean   median    uq     max   neval
>     orig    1.34    1.36    1.54    1.38    1.41    7.41   100
>    other    2.15    2.18    2.37    2.21    2.25    6.72   100
>  dt_orig    1.25    1.33    1.51    1.38    1.46    5.92   100
> dt_other    1.38    1.44    1.66    1.49    1.60    6.00   100

We can see that the time required to make the variable into the right format counter-balance the time gained in the prediction. However, by using data.table, the "other" (fast for predict) variable is created almost as fast as the original one. The creation of the data.table is outside of the benchmark but is usually faster than the creation of a data.frame using the same data.

Overall, there is a potential gain to switch to a "preprocessed" data.table but this implies a lot of changes to ARTnet and EpiModelHIV in order to have the data produced efficiently.

smjenness commented 4 years ago

Thanks for this benchmarking. 9 milliseconds per time step is not worth our concern at this point. I think we can close this issue.

AdrienLeGuillou commented 4 years ago

agreed (I do not have the authorization to close issues)