capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

Package upgrade #162

Closed crystina-z closed 3 years ago

crystina-z commented 3 years ago

for issue #159

crystina-z commented 3 years ago

scores reproduced by python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt at this point:

2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4751
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9380
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate -                       map: 0.3738
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.5535   <----
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.7761
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.8050
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.0, 0.15000000000000002, 0.15000000000000002, 0.2, 0.25]
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4835
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9633
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3901
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.5633
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.8153
rerank: fold=s1 test metrics: P_1=0.620 P_10=0.496 P_20=0.425 P_5=0.568 judged_10=0.934 judged_20=0.913 judged_200=0.701 map=0.361 ndcg_cut_10=0.502 ndcg_cut_20=0.502 ndcg_cut_5=0.540 recall_100=0.516 recall_1000=0.755 recip_rank=0.710
rerank: fold=s2 test metrics: P_1=0.755 P_10=0.555 P_20=0.482 P_5=0.678 judged_10=0.967 judged_20=0.938 judged_200=0.695 map=0.377 ndcg_cut_10=0.592 ndcg_cut_20=0.564 ndcg_cut_5=0.658 recall_100=0.531 recall_1000=0.773 recip_rank=0.842
rerank: fold=s3 test metrics: P_1=0.780 P_10=0.606 P_20=0.512 P_5=0.696 judged_10=0.958 judged_20=0.935 judged_200=0.700 map=0.364 ndcg_cut_10=0.653 ndcg_cut_20=0.609 ndcg_cut_5=0.700 recall_100=0.488 recall_1000=0.727 recip_rank=0.861
rerank: fold=s4 test metrics: P_1=0.680 P_10=0.550 P_20=0.486 P_5=0.616 judged_10=0.978 judged_20=0.951 judged_200=0.729 map=0.392 ndcg_cut_10=0.553 ndcg_cut_20=0.543 ndcg_cut_5=0.574 recall_100=0.595 recall_1000=0.835 recip_rank=0.788
rerank: fold=s5 test metrics: P_1=0.740 P_10=0.560 P_20=0.471 P_5=0.644 judged_10=0.974 judged_20=0.953 judged_200=0.735 map=0.375 ndcg_cut_10=0.591 ndcg_cut_20=0.550 ndcg_cut_5=0.646 recall_100=0.545 recall_1000=0.789 recip_rank=0.825
crystina-z commented 3 years ago

scores reproduced by python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt reranker.trainer.amp=True at this point:

2021-05-25 09:04:18,100 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4819
2021-05-25 09:04:18,102 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9480
2021-05-25 09:04:18,103 - INFO - capreolus.task.rerank.evaluate -                       map: 0.3768
2021-05-25 09:04:18,104 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.5551   <---
2021-05-25 09:04:18,107 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.7761
2021-05-25 09:04:18,107 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.7913
2021-05-25 09:04:18,108 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.1, 0.1, 0.2, 0.2, 0.2]
2021-05-25 09:04:18,111 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4942
2021-05-25 09:04:18,113 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9685
2021-05-25 09:04:18,115 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3903
2021-05-25 09:04:18,116 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.5681
2021-05-25 09:04:18,119 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2021-05-25 09:04:18,120 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.7968
rerank: fold=s1 test metrics: P_1=0.560 P_10=0.504 P_20=0.442 P_5=0.544 judged_10=0.932 judged_20=0.919 judged_200=0.698 map=0.365 ndcg_cut_10=0.500 ndcg_cut_20=0.504 ndcg_cut_5=0.512 recall_100=0.518 recall_1000=0.755 recip_rank=0.664
rerank: fold=s2 test metrics: P_1=0.694 P_10=0.569 P_20=0.452 P_5=0.645 judged_10=0.971 judged_20=0.936 judged_200=0.695 map=0.367 ndcg_cut_10=0.589 ndcg_cut_20=0.536 ndcg_cut_5=0.629 recall_100=0.525 recall_1000=0.773 recip_rank=0.802
rerank: fold=s3 test metrics: P_1=0.720 P_10=0.612 P_20=0.508 P_5=0.676 judged_10=0.976 judged_20=0.941 judged_200=0.702 map=0.358 ndcg_cut_10=0.642 ndcg_cut_20=0.599 ndcg_cut_5=0.666 recall_100=0.490 recall_1000=0.727 recip_rank=0.825
rerank: fold=s4 test metrics: P_1=0.720 P_10=0.572 P_20=0.508 P_5=0.640 judged_10=0.990 judg ed_20=0.975 judged_200=0.755 map=0.403 ndcg_cut_10=0.567 ndcg_cut_20=0.561 ndcg_cut_5=0.592 recall_100=0.597 recall_1000=0.835 recip_rank=0.808
rerank: fold=s5 test metrics: P_1=0.800 P_10=0.594 P_20=0.499 P_5=0.652 judged_10=0.986 judged_20=0.969 judged_200=0.759 map=0.390 ndcg_cut_10=0.618 ndcg_cut_20=0.575 ndcg_cut_5=0.656 recall_100=0.554 recall_1000=0.789 recip_rank=0.858
crystina-z commented 3 years ago

scores with reranker.name=ptparade (same config with docs/reproduction/config_parade.txt, but removing the bertlr and trainer.loss lines), amp=both

2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4783
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9480
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -                       map: 0.3666
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.5478   <---
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.7761
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.7963
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.05, 0.05, 0.1, 0.2, 0.25]
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4896
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9639
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3782
2021-05-26 01:47:48,368 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.5580
2021-05-26 01:47:48,368 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2021-05-26 01:47:48,368 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.7821

and score with python rerank.traineval (knrm on rob04):

2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.3293
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9384
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.7469
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate -                       map: 0.2325
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.3859   <---
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.6989
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.6545
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.05, 0.45, 0.5, 0.65, 0.7000000000000001]
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.3725
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9763
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.2625
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.4350   <---
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.6989
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.6981
crystina-z commented 3 years ago

using torch==1.8.1 seems to decrease the score a bit on my end: the score of running the command below are (on the latest commit):

python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt reranker.trainer.amp=both
torch version mAP P@20 NDCG@20
1.6 0.3687 0.4851 0.5533
1.7 0.3687 0.4851 0.5533
1.8 0.3666 0.4783 0.5478

(btw I've run the experiment for torch-1.8 twice, and the results are exactly the same)