Closed jrabary closed 10 years ago
This is covered in the https://github.com/bmcfee/mlr/blob/master/README under TRAINING.
You need to stack all documents (queries and responses) as columns of the matrix X
, and supply relevant (and optionally irrelevant) indices in the cell array Y
.
For example, if the query document is the i
th column, then Y{i}
would be an array of column ids for relevant results. You can leave Y{j}
empty if j
is a response document.
By default, all other documents will be considered irrelevant. If you want to specify a subset as irrelevant, you can make Y
an n
-by-2
cell array, where Y{i, 1}
is the set of relevant columns for document i
, and Y{i, 2}
is the set of irrelevant documents. This is useful when you only have labels for a small subset, and don't want to assume that all unlabeled documents are irrelevant.
As for svm-map format, since mlr learns a d
-by-d
(dense) matrix, it does not support sparse document vectors. You will need to convert them to dense arrays before running mlr. If you need sparse data, maybe check out @khdlim's newer method in icml2014.
Thank you for the clarification.
Hi all,
Is it possible to have a ranking demo example ? It's not clear how to create the training set in this case. For example, if I have a set of documents and a query document how should I construct the training set. Also, how the svm-map input format
[label] qid:[qid] [feature_id]:[feature_value] [feature_id]:[feature_value] ...
should be translated in mlr input format ?Best regards