Closed sandrejev closed 3 months ago
Thanks for looking at both packages. My personal testing matches your observations. I have a few thoughts:
1) lightgbm
isn't a random forest library by default. Yes, it has random forest capabilities, but it suffers from 1 fatal flaw. It does not use resampling (bagging). This results in worse performance from lightgbm
random forests than ranger
random forests.
2) lightgbm
isn't necessarily faster. lightgbm is extremely fast, but because of the way it is set up, it parallelizes the column search for optimal splits, instead of parallelizing the tree building. Ranger parallelizes the tree building, which results in less cpu idle time.
3) sklearn
random forests take up a TON of memory. This package originally used sklearn
random forests, but I eventually had to switch because they were too slow and memory intensive.
Overall miceRanger gives better performance, and I think it's faster. If you want to speed up performance, try feeding different parameters to ranger
through the dots, such as subsample fraction and mtry. You can also decrease the max_depth to make it much faster.
Hi, first of all great package, both miceRanger and miceForest, congratulations! I am imputing large dataset 25000x6000 of proteomic data. miceRanger gives the best performance and it is also much faster than many other algorithms, but I want to speed up the process as this will be a pipeline with more datasets to come. Right now I am using miceRanger in R but I was wondering whether miecForest is faster (for whatever reason). Have you ever compared the two?