Closed sigmafelix closed 4 months ago
Branching is implemented by separating learn_rate
from the other hyperparameters in fit_base_brulee
and fit_base_xgb
. Each model object size is expected to be reduced and it will lessen GPU memory pressure.
As we have multiple GPUs in HPC nodes, there should be a proper strategy to train base models in multiple GPUs. It is neither given nor cost free, and perhaps we need to be bilingual with R and Python to attain this.
XGBoost and MLP models are available for leveraging GPUs, thus my quick idea to use all GPUs in
targets
way (albeit less elegant than a torch native way) is to specify the CUDA device to fit the model by branching with device numbers, for example,tar_target(char_cuda_device, c("cuda:0", "cuda:1", "cuda:2", "cuda:3"))
when there are four CUDA devices in a certain node.fit_base_*
functions to parametrize device names to be distributed