eth-easl / modyn

Modyn is a research-platform for training ML models on growing datasets.
MIT License
25 stars 3 forks source link

CTR Preprocessing Script Config - How many cores/executors to use? #156

Open ambarish-prakash opened 1 year ago

ambarish-prakash commented 1 year ago

In the CTR Preprocessing, since the GPU used (V100) was different from the given NVIDIA GPU, the config used had to be updated.

The file DeepLearningExamples/PyTorch/Recommendation/DLRM/preproc/DGX-A100_config.sh has been updated in a patch to support the run on a V100 GPU.

However the config is not optimized. It uses only 1 CPU core, and 1 Spark Executor which works fine but is not optimal well configured.

Need to update the config with a better explanation of how best to set those values for the chosen VM.

image
MaxiBoether commented 1 year ago

Note to self: Ambarish said that he encoutered issues when using more than 1 core/executor. I need to investigate it to ensure the script itself is not broken

MaxiBoether commented 1 year ago

Note to self: Ambarish's patch sets spark.task.cpus to 1. However, I am not quite sure of the differences between --parallel_jobs, spark.cores.max, and spark.task.cpus, need to figure that out