bigdong89 / xgboostExtension

xgboost Extension for Easy Ranking & TreeFeature
Apache License 2.0
123 stars 35 forks source link

example on ranker order relevance #11

Closed kirk86 closed 6 years ago

kirk86 commented 6 years ago

Hi and thanks for this nice extension. I was wondering if in your examples folder you could also provide an example on ranking with order relevance. What I mean by that is the following. I have a dataset of the following format:

group_id, data_points[feature_vector1, feature_vector2, feature_vector3, .., N], position[position_of_feature_vector1: 0, position_of_feature_vector2: 1, positon_of_featrure_vector3: 4, .., N]

Eventually I want to predict all the feature_vectors belonging into a group_id while maintaining the order of them as it appears in the position matrix.

In the test set I would like for each group_id to predict N feature vectors in the order that they appear in the training data. Does that make sense?

Thanks!

FlorisHoogenboom commented 6 years ago

I'm not completely sure whether I understand your question completely. The XGBRanker class provides a simple way to apply any LambdaMART like model to your data. This models takes into account the fact that certain feature vectors belong to the same group. To let the model know about this, you just have to make sure that the first column of your feature matrix contains a group identifier.

For predicting the feature matrix should again as a first column contain the group identifier. However, in the predict method this is actually not used. For making predictions it is not neccesarry to know which group a certain feature vector belongs to due to the structure of the LambdaMART model. To get the actual predicted ranks you should do some sorting yourself on the predicted value of the predict method.

I hope this helps a bit. If this was not your question, please let me know!

kirk86 commented 6 years ago

@FlorisHoogenboom thanks a lot, it does clarify things a lot. I'm not per se familiar with LambdaMART but from reading on the web it seems to partially fit under the problem domain I'm trying to use it. The actual problem is a multi instance multi label problem, have you had any prior experience with this kind of domain problem using LambdaMART? Would you suggest any better alternatives?

FlorisHoogenboom commented 6 years ago

Hmm not really, sorry. It is ofcourse easy to translate the multilabel part of your problem into a ranking problem, see for example this paper. However, I don't immediately know a ranking approach to the multi-instance part of your problem.

kirk86 commented 6 years ago

@FlorisHoogenboom thanks for taking the time to answer :smile: . Appreciate it!. Closing this for now.