ranking demo example - Githubissues

jrabary commented 10 years ago

Hi all,

Is it possible to have a ranking demo example ? It's not clear how to create the training set in this case. For example, if I have a set of documents and a query document how should I construct the training set. Also, how the svm-map input format [label] qid:[qid] [feature_id]:[feature_value] [feature_id]:[feature_value] ... should be translated in mlr input format ?

Best regards

bmcfee commented 10 years ago

This is covered in the https://github.com/bmcfee/mlr/blob/master/README under TRAINING.

You need to stack all documents (queries and responses) as columns of the matrix X, and supply relevant (and optionally irrelevant) indices in the cell array Y.

For example, if the query document is the ith column, then Y{i} would be an array of column ids for relevant results. You can leave Y{j} empty if j is a response document.

By default, all other documents will be considered irrelevant. If you want to specify a subset as irrelevant, you can make Y an n-by-2 cell array, where Y{i, 1} is the set of relevant columns for document i, and Y{i, 2} is the set of irrelevant documents. This is useful when you only have labels for a small subset, and don't want to assume that all unlabeled documents are irrelevant.

As for svm-map format, since mlr learns a d-by-d (dense) matrix, it does not support sparse document vectors. You will need to convert them to dense arrays before running mlr. If you need sparse data, maybe check out @khdlim's newer method in icml2014.

jrabary commented 10 years ago

Thank you for the clarification.

bmcfee / mlr

ranking demo example #4