parallelization - Githubissues

AMGold99 / ricardian

Ricardian land value paper (Gold, Binder, Nolte). For soil processing pipeline, see AMGold99/ssurgo-soil repo

0 stars 0 forks source link

parallelization #6

Open binders1 opened 2 years ago

binders1 commented 2 years ago

Nolte (2020) parallelizes the RF analysis at the county level. I'm not sure if this is a separate/additional parallelization to the one you envision using with the h2o package. Nolte's approach essentially estimates a separate model for each county, whereas we're estimating separate models for each USDA farm resource region.

AMGold99 commented 2 years ago

I think that's slightly different than the h2o implementation I'm attempting for the hypergrid search, but I think parallelization of the random forest model itself will definitely be necessary. I know the doParallel package allows you to specify a given number of "clusters," which will then run your model in parallel. I haven't really explored it yet, so I don't know how to specify a county-level clustering of the parallel computation.