Closed sigmafelix closed 8 months ago
@sigmafelix Coincidentally I just came across this LinkedIn post yesterday about tuning XGBoost models.
That's great you are starting some base learner models. However, beginning next week, I think we should prioritize (1) Renaming/Refactoring function names like we have done for the download.R functions, and (2) setting up the target-package pipeline. Target should handle your seed number and config file concern.
@Spatiotemporal-Exposures-and-Toxicology The post would be very helpful to streamline the base learner fitting process. Thank you for sharing the post. For renaming/refactoring, I think we abide by an implicit naming convention where the functions in a R file (i.e., in ./R) has the name starting with the R file name. Perhaps we need to make it explicit to everyone in the next week's meeting.
My comment in #191 includes an example of _targets.R where a seed number is set through the entire pipeline. It should be checked if the seed number setting is applied throughout the multithreaded calculation (i.e., covariate calculation).
I am working on base learners and fitting random forests (with
ranger
) and XGBoost (withxgboost
) is about to be completed. Since these algorithms are based on randomization and we will reuse a fitted model object for predicting 8+M points, I think we need to set a seed number in a config file or a pipeline setting.