Closed gmcdonald-sfg closed 2 years ago
No, I think that adds too much complexity to the lesson. The loop just runs over a few values, so I don't think parallelization is worth it. The xgb.cv function (on my machine) takes advantage of multiple cores already, so it appears that the part that benefits most from parallelization is already parallelized.
In the “Repeat Cross Validation in a Loop” section, I would suggest using lapply or purrr::map instead of for loops. Many in the R community these days are moving away from for loops to list-based iterative functions like lapply or purrr::map. Here’s a good source for reasons why: https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/use-apply-functions-for-efficient-code-r/ . I think this is particularly important for cross-validation in ML because lapply (and purrr::map) allow for parallel processing, while for loops do not. This is exactly what the furrr package is great for. CV is a good example of an “embarrassingly parallel” problem that can/should be done in parallel when possible. You could simply rewrite your code as follows:
Or, if you want progress bars: