Open steventartakovsky-vungle opened 3 years ago
Nicholas, thanks for the surprise lib.
I have a question that is likely with the above. I have 21-51 nodes with 24 core each.
I'm trying to use the Dask lib to work in parallel in multiple nodes. So, what I want to do:
OBS: I am using the Slurm job.
`
def grid_search_instance(instance, params, dataset, measures, folds, label, n_jobs=N_CORES):
"""
Grid Search Cross Validation to get the best params to the recommender algorithm
:param label: Recommender string name
:param instance: Recommender algorithm instance
:param params: Recommender algorithm params set
:param dataset: A dataset modeled by the surprise Reader class
:param measures: A string with the measure name
:param folds: Number of folds to cross validation
:param n_jobs: Number of CPU/GPU to be used
:return: A Grid Search instance
"""
cluster = SLURMCluster(cores=24,
processes=2,
memory='64GB',
queue="nvidia_dev",
project="NMF",
name=label,
log_directory='logs/slurm',
walltime='00:15:00')
# cluster.scale(2)
cluster.adapt(minimum=1, maximum=360)
client = dask.distributed.Client(cluster)
print(client)
print(cluster.job_script())
gs = GridSearchCV(instance, params, measures=measures, cv=folds, joblib_verbose=100)
with joblib.parallel_backend("dask"):
print(client)
gs.fit(dataset)
return gs
`
And again, thanks for the surprise and for spend your time reading this question.
Hi all, sorry for the late reply. Surprise doesn't support multi-node training, sorry.
Where is the documentation on the dataset size limitation and how to scale Surprise to multiple machines?
Thanks - Steven