blind-contours / CVtreeMLE

:deciduous_tree: :dart: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation
MIT License
5 stars 1 forks source link

Runtime performance guidelines #13

Closed lightning-auriga closed 2 years ago

lightning-auriga commented 2 years ago

ML techniques are understandably pretty heavy to run, and even these test examples put some strain on the underpowered computer I used for package evaluation. I think it would be really helpful for the user to have a general understanding of the expected runtime and resource requirements for this tool with realistic dataset sizes. Does the user need this installed on an academic HPC or cloud ec2/etc. to get it to run? How big a dataset/complex a model space can it handle? These are the kinds of practical questions that I think could really help a potential user.

blind-contours commented 2 years ago

I've added runtime performance guidelines in the vignette. I've also added default learners for each parameter that are nonparametric but also run pretty quickly. Install and running all the tests will still take a while but for a standard dataset this should run on a modern local machine within a couple hours depending the dimensions and number of folds chosen. Proc time is given in each example so users can see how long it runs on some simulated data and real world type synthetic data.

lightning-auriga commented 2 years ago

Thank you for including this, I think it helps a lot.