arteagac / xlogit

A Python package for GPU-accelerated estimation of mixed logit models.
https://xlogit.readthedocs.io
GNU General Public License v3.0
51 stars 22 forks source link

Adding logitr to benchmarking #5

Open jhelvy opened 2 years ago

jhelvy commented 2 years ago

@arteagac, xlogit looks amazing, thanks for your work on this!

As @crforsythe noted in issue #3 on implementing WTP space models, I'm the author of the logitr R package. It seems both packages have very similar functionality (except for the WTP space implementation, which shouldn't be too hard to add if you're interested in adding it). logitr is considerably faster than most of the other R packages you benchmark against, but I don't think it's speed will scale with very large numbers of draws like xlogit. It'd be great to see it added to your benchmarking. I was actually in the process of attempting to benchmark it myself, but since you've already got yours setup it should be pretty easy to add it. I believe xlogit and logitr use the same data structure too - the UI looks quite similar.

Really nice work you've done here!

jhelvy commented 2 years ago

Btw, I just ran the same example as the one in the README file using logitr (code below), and the run time (on my tiny macbook) with 600 draws was 118 seconds. Same example with xlogit is 33 seconds without GPU processing. I'm having trouble installing CUDA on my mac, but even without GPU processing that's a ~4x speedup over logitr. Amazing!

library(logitr)
library(readr)

df <- read_csv("https://raw.githubusercontent.com/arteagac/xlogit/master/examples/data/electricity_long.csv")

model <- logitr(
    data = df,
    outcome = 'choice',
    obsID = 'chid',
    pars = c('pf', 'cl', 'loc', 'wk', 'tod', 'seas'),
    panelID = 'id',
    randPars = c(
        'pf' = 'n', 'cl' = 'n', 'loc' = 'n',
        'wk' = 'n', 'tod' = 'n', 'seas' = 'n'),
    numDraws = 600
)

summary(model)
arteagac commented 2 years ago

Hi @jhelvy, thanks a lot for your kind comments. Yes, we are planning to implement the WTP space specification, and logitr will make it so much easier, as it has a great documentation for WTP. Regarding the benchmark, I will definetely include logitr on it. I still have access to the 80 core Linux server, so I will be able to test it even in parallel processing. Sorry I missed including logitr in the initial benchmark. I wrote most of the benchmark code in October of 2020, and I think logitr was still under development by then.

arteagac commented 2 years ago

By the way, xlogit uses analytical gradients, which may give it an extra boost in speed. This code you just provided for the electricity dataset is super useful, as I can directly plug it into the benchmark. Quick question, the parallel processing in logitr works only for multi-start or is there a way to enable it for a single estimation?

jhelvy commented 2 years ago

No worries, yes logitr didn't really get fast til I made some changes last summer (2021). logitr aslo uses analytic gradients, so it's pretty quick too, but I'm sure it won't compare to xlogit. The parallelization only works on the multistart, I didn't get that into a single estimation yet.

jhelvy commented 2 years ago

Btw, I can share an overleaf doc I wrote with all the analytic gradients for the WTP space models if you'd like. The biggest speed up in logitr comes from doing some math tricks to pre-compute a lot of things, so the actual calculations being done inside the optimization iterations isn't as expensive. I haven't looked yet at how you've computed things in xlogit, but it's incredible how much faster things can get depending how the math is done.

crforsythe commented 2 years ago

I'll say, logitr is using numerically estimated hessians while xlogit is using the hessian calculation implemented by mlogit. I believe the numerical estimation of the hessians adds significant computation time for mixed logit models (I am planning on enabling a toggle for xlogit soon to show this). So this could be driving some of the additional compute time on logitr's part, but xlogit's GPU processing capability would likely still lead to reductions in processing time.

arteagac commented 2 years ago

@jhelvy, yes please, it would be super helpful! Actually, yesterday I was planning to discuss with @crforsythe that perhaps for the WTP implementation we had to switch to finite differences, but I am so glad you figured out the analytical gradients for non-linear WTP, which will makes a big difference in estimation speed. By the way, xlogit also has a nice feature called batch processing, which allows to escalate estimations using thousands of individuals and at the same time millions of random draws on any GPU. This works by splitting the data into batches, doing partial computations for the batches, and at the end putting together the results for all the batches. This does not affect the final result and avoids overflowing the GPU memory for big datasets.

jhelvy commented 2 years ago

The hessian isn't used during estimation for logitr, only the gradients are used. Only after converging is the hessian computed (numerically). So that could be one source of speed improvement.

How I compute the probabilities is probably the biggest speed improvement for logitr (see attached screen shot). By using only the probability of the chosen alternative, there are fewer calculations to be made. And when you look at the gradients, much of the elements in it don't change (e.g., the X_nj - X_n* in equations 4 & 5). So those (large) pieces can be pre-computed, speeding everything up considerably. The screenshot below is just for MNL, but the same strategy is used for MXL.

Is this by chance how you are computing probabilities in xlogit? If not, then there's probably even more speed you can get out by reformulating them.

mnl_logit_grad

jhelvy commented 2 years ago

This is from the overleaf doc I mentioned - send me an email and I'll invite you to it: jph gwu edu

crforsythe commented 2 years ago

@jhelvy Yes, exactly re: hessians. xlogit just calculates the individual-level gradients once (essentially requiring one round of SLL for the whole dataset). logitr is using finite differences, so that likely is requiring 2 SLL calculations for each diagonal and off-diagonal element of the covariance matrix. I find that most packages leverage the latter strategy.

jhelvy commented 2 years ago

@crforsythe Yes that makes sense. I never sat down long enough to derive the analytic hessian for WTP space models.

arteagac commented 2 years ago

@crforsythe, exactly. The finite differences adds some overhead, but it should not be that bad, as this computation is only done at the end of the estimation.

@jhelvy, thank you so much for the explanation of how you compute the probabilities. Currently, xlogit uses a slower approach, but it would be very easy to adapt it to your faster approach. Thank you for providing your email. I will email you soon.

jhelvy commented 2 years ago

Right, and even though the hessian may be more difficult to compute using this formulation for the probabilities, it overall is probably faster as the gradients can get substantially faster. Anyway, xlogit is already blazingly fast, so the fact that you're using a slower formula for the probabilities is mind blowing. You could probably boost speed considerably further with this reformulation. Happy to help on this as it's much more further along in terms of performance than where logitr is and has a very similar UI.

arteagac commented 2 years ago

Thanks a lot for offering your help @jhelvy. We will probably have many questions while integrating your WTP implementation into xlogit, so your help will be very valuable. I will also look further into your formulation for faster computation of probabilities.