NicolasHug / Surprise

A Python scikit for building and analyzing recommender systems
http://surpriselib.com
BSD 3-Clause "New" or "Revised" License
6.36k stars 1.01k forks source link

How to perform a GridSearch on BaselineOnly model? #77

Closed hengji-liu closed 6 years ago

hengji-liu commented 7 years ago

The class BaselineOnly takes bsl_option as parameters instead of specific reg or learning_rate. How to perform a GridSearch on BaselineOnly model?

NicolasHug commented 7 years ago

Hi,

Grid search with BaselineOnly can be done in the exact same way as other algorithms, e.g.:

param_grid = {'bsl_options':[{'method': 'als'}, {'method': 'sgd'}]}

Nicolas

hengji-liu commented 7 years ago

Hi, Sorry for the late reply. Suppose I'm using SGD and I want to do a cross-validation on reg, learning_rate and n_epochs. It looks like I have to enumerate these 3 parameters to form different bsl_options and put these bsl_options into param_grid. To illustrate,

param_grid = {'bsl_options': [{'reg': 0.1, 'learning_rate': 0.1, 'n_epochs': 100}, {'reg': 0.1, 'learning_rate': 0.1, 'n_epochs': 200}, {'reg': 0.1, 'learning_rate': 0.2, 'n_epochs': 100}, {'reg': 0.1, 'learning_rate': 0.2, 'n_epochs': 200},

the list goes on, just to enumerate the params manullay

                              ]}

I feel the ideal way is

param_grid = {'reg': [0.1, 0.2], 'learning_rate': [0.1, 0.2], 'n_epochs': [100, 200]}

But apparently this won't work given the current design. It gets a bit more confusing when the predictor takes other parameters, in that case, some are in the option, some are just parameters of the predictor class. The same applies to KNN methods as well, where sim_options is used. My intention was just to check with you if I'm using the library in a wrong way, because it seems to me a bit troublesome and not intuitive to use the library in this way. But anyway, my current workaround is to generate the bsl_options first using extra code.

NicolasHug commented 7 years ago

Ho indeed, I didn't think it through.

So basically with dictionary parameters with multiple keys, we currently have to enumerate all the combinations by hand. Looking at the current implementation of GridSearch, I can't think of an easy or clean way to overcome this. Would you have any suggestion? Also, could you please show me your current workaround?

Sorry for closing the issue and thanks for pointing that out! Nicolas

hengji-liu commented 7 years ago

My current workaround is: (take knn as an example)

    names = ('msd', 'cosine', 'pearson')
    user_baseds = (True,)
    min_supports = (1, 2, 3, 4, 5, 10, 15, 20, 25)
    options = list()
    # fill options with dictionaries
    for name in names:
        for user_based in user_baseds:
            for min_support in min_supports:
                d = dict()
                d['name'] = name
                d['user_based'] = user_based
                d['min_support'] = min_support
                options.append(d)
    # make options a value of 'sim_options'
    param_grid = {
        'k': [4, 6, 8, 10, 12],
        'min_k': [1, 2, 3],
        'sim_options': options
    }
NicolasHug commented 6 years ago

Hey, I just pushed a fix for this. You can now use GridSearch in a more natural way as follows:

param_grid = {'k': [10, 20],
              'sim_options': {'name': ['msd', 'cosine'],
                              'min_support': [1, 5],
                              'user_based': [False]}
              }

I added this to the (latest) docs as a note.

Thanks for raising the issue! Nicolas

yustiks commented 3 years ago

Hey, I just pushed a fix for this. You can now use GridSearch in a more natural way as follows:

param_grid = {'k': [10, 20],
              'sim_options': {'name': ['msd', 'cosine'],
                              'min_support': [1, 5],
                              'user_based': [False]}
              }

I added this to the (latest) docs as a note.

Thanks for raising the issue! Nicolas

Thank you, @NicolasHug