NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.42k stars 1.01k forks source link

GridSearchCV always recommends the first parameter combination as best #464

Open n-srinidhi opened 1 year ago

n-srinidhi commented 1 year ago

Hi!I am trying to use GridSearchCV to estimate the best combination of parameter values. I am trying to use simple SVD, with a single parameter n_factors like this:

param_grid = {'n_factors': [4,6,9,11,14,18,29]}
gs = GridSearchCV(SVD, param_grid, measures=['rmse'], cv=5)
gs.fit(_data)

# best RMSE score
print(gs.best_score['rmse'])

# combination of parameters that gave the best RMSE score
print(gs.best_params['rmse'])

No matter the hyper parameter values, It always returns the first value as the best choice.

NicolasHug commented 1 year ago

what data are you using? Can you show the score? It's possible that they're all NaNs or all equal

n-srinidhi commented 1 year ago

This is a subset of the sample data used: image

These are the RMSE values generated when I did a cross validation before:

rmse_svd = []

for k in [4,6,9,11,14,18,29]:
    _svd = SVD(n_factors = k)
    #Using cross validate to compute the error value for each fold
    #["test_rmse"] is a numpy array with min accuracy value for each testset
    loss_svd = cross_validate(_svd, _data, measures=['rmse'], cv=5, verbose=False)["test_rmse"].mean() 
    rmse_svd.append(loss_svd)

RMSE Values:

rmse_svd
[39902018016.785095,
 36790327930.47013,
 39599051199.904175,
 38282395437.54082,
 38365874488.68493,
 39962080407.07541,
 37076960431.81497]

Running the above GridSearchCV code I get:

39937276086.8539
{'n_factors': 4}