Open binders1 opened 2 years ago
I think that's slightly different than the h2o implementation I'm attempting for the hypergrid search, but I think parallelization of the random forest model itself will definitely be necessary. I know the doParallel package allows you to specify a given number of "clusters," which will then run your model in parallel. I haven't really explored it yet, so I don't know how to specify a county-level clustering of the parallel computation.
Nolte (2020) parallelizes the RF analysis at the county level. I'm not sure if this is a separate/additional parallelization to the one you envision using with the h2o package. Nolte's approach essentially estimates a separate model for each county, whereas we're estimating separate models for each USDA farm resource region.