blind-contours / CVtreeMLE

:deciduous_tree: :dart: Cross Validated Decision Trees with Targeted Maximum Likelihood Estimation
MIT License
5 stars 1 forks source link

Simplify unit tests #27

Open wleoncio opened 1 year ago

wleoncio commented 1 year ago

Running devtools::test() required the installation of mlbench package. Also, I had to manually tweak the tests to use more CPUs to bring computing time to an acceptable state.

This is something that would only bother developers, possibly scaring away contributors, so not a big deal, but it would be nice for unit tests to be faster (perhaps by using smaller datasets, fewer iterations, more CPUs) and more complex (most of them just check output class or triggered errors).

blind-contours commented 1 year ago

Hi @wleoncio - I've added MLbench to the suggests area. I like having this particular test in the tests because it's a real example of finding some cellular characteristic combinations that lead to cancer. I'm not sure which version of CVtreeMLE you tested on but recently I added a fit_marginals parameter which is default to FALSE. This then only looks for interactions and doesn't fit trees on each individual exposure. This substantially reduces computational time in all the tests. For example, on that breast cancer dataset with 700 obs, 2 folds and 2 cpus, it takes 1.5 minutes. If I use more than 2 cores in the tests then I get a check flag failure, so I think that's the max I can use. Let me know if 1.5 minutes is still too long. I added a couple additional tests to make sure all the outputs connect correctly. I've removed a test on the marginal tree function which I didn't think was useful and took a long time and replaced it with a test on small sample size of simulated data and compared estimates to ground-truth. I think this is more informative. Let me know if you'd like anything else.

wleoncio commented 1 year ago

Tests are still taking an inordinate amount of time on my machine (8th gen i5 CPU), see screenshot below:

20min

How much time does the whole devtools::test() procedure take on your side? Since the breast cancer test takes half as long as I got, I'm guessing the whole thing takes around 10 minutes.

blind-contours commented 1 year ago

Hi @wleoncio - attached is my run:

Screen Shot 2023-01-30 at 9 44 04 AM

blind-contours commented 1 year ago

So it's about 12 minutes - is that reasonable? I can exchange some things if that feels better.

wleoncio commented 1 year ago

Thanks for the extra info, @blind-contours! It would be nice those those mixture tests closer to the minute mark, but in the end it's up to you how much time you want to spend optimizing this part of the code that doesn't directly affect users.