dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.3k stars 8.73k forks source link

Selection of objective "rank:ndcg" results in lower NDCG than "rank:pairwise" #4177

Closed Edmondguo closed 3 years ago

Edmondguo commented 5 years ago

Thanks for adding ranking task support in xgboost! But I have a few questions:

  1. Docs says "Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized", I want to know particular function of "pairwise loss".
  2. I can not understand "Use LambdaMART to perform pairwise ranking". According to 《From RankNet to LambdaRank to LambdaMART: An Overview》, LambdaMART is a Listwise method, which optimizes NDCG.
kretes commented 5 years ago

Hi @Edmondguo . I can just point to some historic discussion and my understanding of how xgboost works, so it would still be good to get some official confirmation for that, e.g. from @hcho3 .

I believe rank:pairwise is a pairwise method that tries to minimize the number of pairwise errors. rank:ndcg is a method following LambdaMART and when you dig in the code - it will confirm that rank:ndcg is an extension of rank:pairwise with additional weights added to the loss of each pair.

however in a few experiments it looks as if rank:ndcg performs worse than rank:pairwise, and it might be due to the implementation. see e.g. https://github.com/dmlc/xgboost/issues/2092#issuecomment-286819394

Some time ago we verified rank:ndcg to perform a bit worse when evaluated on ndcg than rank:pairwise in our case.

hcho3 commented 5 years ago

rank:ndcg is an extension of rank:pairwise with additional weights added to the loss of each pair.

Exactly. In "From RankNet to LambdaRank to LambdaMART", LambdaMART optimizes NDCG by optimizing the pairwise loss (with lambda's) that is weighted with change in NDCG.

Edmondguo commented 5 years ago

Hi @Edmondguo . I can just point to some historic discussion and my understanding of how xgboost works, so it would still be good to get some official confirmation for that, e.g. from @hcho3 .

I believe rank:pairwise is a pairwise method that tries to minimize the number of pairwise errors. rank:ndcg is a method following LambdaMART and when you dig in the code - it will confirm that rank:ndcg is an extension of rank:pairwise with additional weights added to the loss of each pair.

however in a few experiments it looks as if rank:ndcg performs worse than rank:pairwise, and it might be due to the implementation. see e.g. #2092 (comment)

Some time ago we verified rank:ndcg to perform a bit worse when evaluated on ndcg than rank:pairwise in our case.

Thank you very much! In my experiment I also found that rank:ndcg perform worse than rank:pairwise.

Edmondguo commented 5 years ago

rank:ndcg is an extension of rank:pairwise with additional weights added to the loss of each pair.

Exactly. In "From RankNet to LambdaRank to LambdaMART", LambdaMART optimizes NDCG by optimizing the pairwise loss (with lambda's) that is weighted with change in NDCG.

Thank you! So is it means in rank:pairwise, xgboost use lambda's which is derived by "Cross Entropy Loss" in RankNet as the loss funtion?

hcho3 commented 5 years ago

@Edmondguo Yes

hcho3 commented 5 years ago

@Edmondguo @kretes Would you be interested in posting an example where you get better NDCG metric by choosing rank:pairwise instead of rank:ndcg? I'd like to see if this is a bug or a chance.

Edmondguo commented 5 years ago

@Edmondguo @kretes Would you be interested in posting an example where you get better NDCG metric by choosing rank:pairwise instead of rank:ndcg? I'd like to see if this is a bug or a chance.

The project I am dealing with is using rank model in quantitative stock selection.It seems hard to provide because the data is too big.In this case "rank:pairwise" performs much better than "rank:ndcg" under the same booster parameters.the NDCG are 0.5138 for "rank:ndcg", 0.5586 for "rank:pairwise".

hcho3 commented 5 years ago

@Edmondguo Does your data have multiple relevance judgment levels (1, 2, 3, 4, ...) ?

Edmondguo commented 5 years ago

@Edmondguo Does your data have multiple relevance judgment levels (1, 2, 3, 4, ...) ?

Yes,before I train the model,I have change y into (1,2,3,...30)

hcho3 commented 5 years ago

It would be nice if there is a toy example we can use to show rank:pairwise outperforming rank:ndcg. Without an example, it is hard to find out why rank:ndcg is not working well.

kretes commented 5 years ago

Hello.

I believe I found an example where this is reproducible. rank-pairwise gives ndcg 1 while rank:ndcg cannot. See this gist: https://gist.github.com/kretes/1228e571aeba2a57f617352af633cd40.

I hope this will help nailing the issue

sano176 commented 5 years ago

i met this problem too, i could't find the reason to explain it, "objective = rank:pairwise" better than "objective = rank:ndcg"

chloe-wang commented 4 years ago

@Edmondguo just want to follow up this issue. I met the same problem. Did you figure out the reason?

trivialfis commented 3 years ago

Some explanation is given in https://github.com/dmlc/xgboost/issues/6352 . For future work, see https://github.com/dmlc/xgboost/issues/6450 .