Closed crystina-z closed 3 years ago
scores reproduced by python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt
at this point:
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4751
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9380
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - map: 0.3738
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.5535 <----
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.7761
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.8050
2021-05-24 07:18:55,735 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.0, 0.15000000000000002, 0.15000000000000002, 0.2, 0.25]
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate - P_20 [interp]: 0.4835
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate - judged_20 [interp]: 0.9633
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate - map [interp]: 0.3901
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20 [interp]: 0.5633
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate - recall_1000 [interp]: 0.7761
2021-05-24 07:18:55,736 - INFO - capreolus.task.rerank.evaluate - recip_rank [interp]: 0.8153
rerank: fold=s1 test metrics: P_1=0.620 P_10=0.496 P_20=0.425 P_5=0.568 judged_10=0.934 judged_20=0.913 judged_200=0.701 map=0.361 ndcg_cut_10=0.502 ndcg_cut_20=0.502 ndcg_cut_5=0.540 recall_100=0.516 recall_1000=0.755 recip_rank=0.710
rerank: fold=s2 test metrics: P_1=0.755 P_10=0.555 P_20=0.482 P_5=0.678 judged_10=0.967 judged_20=0.938 judged_200=0.695 map=0.377 ndcg_cut_10=0.592 ndcg_cut_20=0.564 ndcg_cut_5=0.658 recall_100=0.531 recall_1000=0.773 recip_rank=0.842
rerank: fold=s3 test metrics: P_1=0.780 P_10=0.606 P_20=0.512 P_5=0.696 judged_10=0.958 judged_20=0.935 judged_200=0.700 map=0.364 ndcg_cut_10=0.653 ndcg_cut_20=0.609 ndcg_cut_5=0.700 recall_100=0.488 recall_1000=0.727 recip_rank=0.861
rerank: fold=s4 test metrics: P_1=0.680 P_10=0.550 P_20=0.486 P_5=0.616 judged_10=0.978 judged_20=0.951 judged_200=0.729 map=0.392 ndcg_cut_10=0.553 ndcg_cut_20=0.543 ndcg_cut_5=0.574 recall_100=0.595 recall_1000=0.835 recip_rank=0.788
rerank: fold=s5 test metrics: P_1=0.740 P_10=0.560 P_20=0.471 P_5=0.644 judged_10=0.974 judged_20=0.953 judged_200=0.735 map=0.375 ndcg_cut_10=0.591 ndcg_cut_20=0.550 ndcg_cut_5=0.646 recall_100=0.545 recall_1000=0.789 recip_rank=0.825
scores reproduced by python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt reranker.trainer.amp=True
at this point:
2021-05-25 09:04:18,100 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4819
2021-05-25 09:04:18,102 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9480
2021-05-25 09:04:18,103 - INFO - capreolus.task.rerank.evaluate - map: 0.3768
2021-05-25 09:04:18,104 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.5551 <---
2021-05-25 09:04:18,107 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.7761
2021-05-25 09:04:18,107 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.7913
2021-05-25 09:04:18,108 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.1, 0.1, 0.2, 0.2, 0.2]
2021-05-25 09:04:18,111 - INFO - capreolus.task.rerank.evaluate - P_20 [interp]: 0.4942
2021-05-25 09:04:18,113 - INFO - capreolus.task.rerank.evaluate - judged_20 [interp]: 0.9685
2021-05-25 09:04:18,115 - INFO - capreolus.task.rerank.evaluate - map [interp]: 0.3903
2021-05-25 09:04:18,116 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20 [interp]: 0.5681
2021-05-25 09:04:18,119 - INFO - capreolus.task.rerank.evaluate - recall_1000 [interp]: 0.7761
2021-05-25 09:04:18,120 - INFO - capreolus.task.rerank.evaluate - recip_rank [interp]: 0.7968
rerank: fold=s1 test metrics: P_1=0.560 P_10=0.504 P_20=0.442 P_5=0.544 judged_10=0.932 judged_20=0.919 judged_200=0.698 map=0.365 ndcg_cut_10=0.500 ndcg_cut_20=0.504 ndcg_cut_5=0.512 recall_100=0.518 recall_1000=0.755 recip_rank=0.664
rerank: fold=s2 test metrics: P_1=0.694 P_10=0.569 P_20=0.452 P_5=0.645 judged_10=0.971 judged_20=0.936 judged_200=0.695 map=0.367 ndcg_cut_10=0.589 ndcg_cut_20=0.536 ndcg_cut_5=0.629 recall_100=0.525 recall_1000=0.773 recip_rank=0.802
rerank: fold=s3 test metrics: P_1=0.720 P_10=0.612 P_20=0.508 P_5=0.676 judged_10=0.976 judged_20=0.941 judged_200=0.702 map=0.358 ndcg_cut_10=0.642 ndcg_cut_20=0.599 ndcg_cut_5=0.666 recall_100=0.490 recall_1000=0.727 recip_rank=0.825
rerank: fold=s4 test metrics: P_1=0.720 P_10=0.572 P_20=0.508 P_5=0.640 judged_10=0.990 judg ed_20=0.975 judged_200=0.755 map=0.403 ndcg_cut_10=0.567 ndcg_cut_20=0.561 ndcg_cut_5=0.592 recall_100=0.597 recall_1000=0.835 recip_rank=0.808
rerank: fold=s5 test metrics: P_1=0.800 P_10=0.594 P_20=0.499 P_5=0.652 judged_10=0.986 judged_20=0.969 judged_200=0.759 map=0.390 ndcg_cut_10=0.618 ndcg_cut_20=0.575 ndcg_cut_5=0.656 recall_100=0.554 recall_1000=0.789 recip_rank=0.858
scores with reranker.name=ptparade
(same config with docs/reproduction/config_parade.txt
, but removing the bertlr
and trainer.loss
lines), amp=both
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4783
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9480
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - map: 0.3666
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.5478 <---
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.7761
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.7963
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.05, 0.05, 0.1, 0.2, 0.25]
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - P_20 [interp]: 0.4896
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - judged_20 [interp]: 0.9639
2021-05-26 01:47:48,367 - INFO - capreolus.task.rerank.evaluate - map [interp]: 0.3782
2021-05-26 01:47:48,368 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20 [interp]: 0.5580
2021-05-26 01:47:48,368 - INFO - capreolus.task.rerank.evaluate - recall_1000 [interp]: 0.7761
2021-05-26 01:47:48,368 - INFO - capreolus.task.rerank.evaluate - recip_rank [interp]: 0.7821
and score with python rerank.traineval
(knrm on rob04):
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate - P_20: 0.3293
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9384
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate - judged_200: 0.7469
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate - map: 0.2325
2021-05-25 17:26:12,834 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.3859 <---
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.6989
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.6545
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.05, 0.45, 0.5, 0.65, 0.7000000000000001]
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - P_20 [interp]: 0.3725
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - judged_20 [interp]: 0.9763
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - map [interp]: 0.2625
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20 [interp]: 0.4350 <---
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - recall_1000 [interp]: 0.6989
2021-05-25 17:26:12,835 - INFO - capreolus.task.rerank.evaluate - recip_rank [interp]: 0.6981
using torch==1.8.1 seems to decrease the score a bit on my end: the score of running the command below are (on the latest commit):
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_paradept.txt reranker.trainer.amp=both
torch version |
mAP | P@20 | NDCG@20 |
---|---|---|---|
1.6 | 0.3687 | 0.4851 | 0.5533 |
1.7 | 0.3687 | 0.4851 | 0.5533 |
1.8 | 0.3666 | 0.4783 | 0.5478 |
(btw I've run the experiment for torch-1.8 twice, and the results are exactly the same)
for issue #159