Notes and performance tests on Cyclops from MacBook Pro

Working with a new install of R on a new Macbook. FYI, I had to change my workspace to "/" to get install to complete properly.
Some interesting performance observations with R on Macs. Interested to hear thoughts. Unlike Windows and Linux, Mac on R uses multicore by default through ATLAS (an optimized BLAS). So I think that these tests should reflect the difference between Cyclops singlecore and multi core R.
Based on results below, it would indeed be helpful to separate the data frame from the model that is being estimated. For example, my probit and logit models might perform slightly differently on the data, but it would not be unheard of to estimate both on the same data set.
I hypothesize that Cyclops could be even further improved if it were deployed with Revolution R using Rvolution linear algebra kernels. When I get back to a windows OS I will try vanilla R, Revolution R, with and without Cyclops.

http://mran.revolutionanalytics.com

So here is the benchmarking, I haven't looped for statistics.

#Generate 10 X 20000000 and 1 X 20000000)
>  mlgen<-mlbench.friedman1(20000000,sd=.1)
> mlgen$bin<-test$y<mean(test$y)
> 
> # OLS
> system.time(cyclopsData <- createCyclopsDataFrame(mlgen$y ~ mlgen$x, modelType="ls"), gcFirst=TRUE)
   user  system elapsed 
 60.604   5.544  71.226 
> system.time(cyclopsFit <- fitCyclopsModel(cyclopsData), gcFirst=TRUE)
   user  system elapsed 
111.649   0.766 111.861 
> 
> #IWLS
> system.time(glmFit<-glm.fit(mlgen$x,mlgen$y), gcFirst=TRUE)
   user  system elapsed 
 33.594  10.497  50.574 
> 
> rm(cyclopsData)
> rm(cyclopsFit)
> 
> #LR
> system.time(cyclopsData <- createCyclopsDataFrame(mlgen$bin ~ mlgen$x, modelType="lr"), gcFirst=TRUE)
   user  system elapsed 
 25.297   5.661  35.693 
> system.time(cyclopsFit <- fitCyclopsModel(cyclopsData), gcFirst=TRUE)
   user  system elapsed 
451.624   1.447 452.686 
> 
> #GLMwLogit
> system.time(glmFit<-glm.fit(mlgen$x,mlgen$bin,family=binomial(link = "logit")), gcFirst=TRUE)
   user  system elapsed 
 48.457  19.237  94.046 
> 
> 
> Sys.info()
                                                                                            sysname 
                                                                                           "Darwin" 
                                                                                            release 
                                                                                           "13.3.0" 
                                                                                            version 
"Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64" 
                                                                                           nodename 
                                                                             "DM-MacBook-Pro.local" 
                                                                                            machine 
                                                                                           "x86_64" 
                                                                                              login 
                                                                                   "daniellameeker" 
                                                                                               user 
                                                                                   "daniellameeker" 
                                                                                     effective_user 
                                                                                   "daniellameeker" 
>

There are few reasons to use cyclic coordinate descent or any of its variations for unconditional (unstratified) models with few low numbers of covariates. The examples above, if I am reading things correctly, use 1 and 10 covariates and are unconditioned likelihoods. In such situations, almost any fitting algorithm based on least squares (or IRLS) will outperform (by orders of magnitude) coordinated based methods. Cyclops is geared towards high-dimensional (1K, 1M, 10M covariate) problems as often found in observational healthcare data.

Current Cyclops implementation does not use BLAS-level or LINPACK operations, so your specific BLAS implementation (ATLAS, MKL, Rvolution, etc.) should have little influence. Currently, Cyclops also does not implement vectorized or multicore scheduling ... this would be straight-forward to accomplish. However, speed-ups here, even on outrageously expensive 40-core machines, top out at about a 5 - 10 - fold speedup, due to memory bandwidth limitations. There is a GPU-version of the SCCS implementation; see Suchard et al. 2013; it kicks ass.

Thanks - If you don't think that these are a good showcase, is there a test routine somewhere up here that can be sized up and down that I can use to demo performance to our team using my laptop? I want to give our CDRN team a preview of a comparison between the different flavors of R accelerators that have been developed for glm & cox. OHDSI, Oracle, the VA, and Revolution all have solutions that are slightly different.

Also, the examples I'm digging up in the current unit tests you have look like they have traditional multivariate glm regressions with indicator variables (e.g. dobson example), but nothing that I normally would think of as inherently stratified (like a mixed or random effects model with stratified intercepts). This is a weakness of most of the other R accelerators, so I would like to showcase Cyclops if it has these capabilities.

Martijn is working on a simulation framework that closely mirrors the regressions that commonly occur in large-scale cohort and self-controlled designs, i.e. 1,000s of strata (via matching, etc.), 10,000s of subjects, 1,000,000s of covariates. The framework is still half-baked, so you'll (currently) have to wade through the Simulation.R file for pointers.

As an alternative, our industry friends may be able to provide us with an anonymized, permuted, etc., regression example that arises in standard cohort study modeled after one of the larger claims databases.

Avoiding random effects at scale is the primary motivation of fitting a conditioned model. For example, a conditional Poisson regression arises from a Poisson regression where all subjects within a stratum share a random-intercept. By conditioning on the total number of outcome events, the (very large number of) random-effects fall out the likelihood, so one does not need to estimate them directly. This certainly generates a large computation savings. However, I am still very unclear what the statistical efficiency cost is. This is a great research question and I would be very happy to co-supervise a project exploring the impact. There are arguments that conditioning is more efficient and there are are arguments that it is less.

For more complicated random-effects, one can parameterize these directly in the regression design matrix X. If X is sparse, then there is (almost) no memory overhead in adding, for example, an addition covariate per subject (for subject-level random effects) and estimating these coefficients along with all the main effects, assuming one is using some sort of prior (regularization). This ideas works for any regression fitter (glm(), coxph(), etc.) Of course, an implementation that employs a sparse or indicator representation should work best here.

OHDSI / Cyclops

Notes and performance tests on Cyclops from MacBook Pro #1