capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

PARADE replication result #107

Closed stephaniewhoo closed 4 years ago

stephaniewhoo commented 4 years ago

I replicated on Colab GPU with numpassage=8,

Here is my result.

INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 dev metrics: P_1=0.688 P_10=0.500 P_20=0.427 P_5=0.554 judged_10=1.000 judged_20=0.992 judged_200=0.947 map=0.254 ndcg_cut_10=0.516 ndcg_cut_20=0.491 ndcg_cut_5=0.546 recall_100=0.453 recall_1000=0.453 recip_rank=0.802
2020-10-20 18:21:08,984 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 test metrics: P_1=0.574 P_10=0.494 P_20=0.417 P_5=0.562 judged_10=0.989 judged_20=0.982 judged_200=0.931 map=0.282 ndcg_cut_10=0.497 ndcg_cut_20=0.484 ndcg_cut_5=0.523 recall_100=0.490 recall_1000=0.490 recip_rank=0.721
INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 dev metrics: P_1=0.653 P_10=0.459 P_20=0.386 P_5=0.551 judged_10=0.988 judged_20=0.974 judged_200=0.935 map=0.218 ndcg_cut_10=0.511 ndcg_cut_20=0.473 ndcg_cut_5=0.559 recall_100=0.391 recall_1000=0.391 recip_rank=0.752
2020-10-20 19:37:53,818 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 test metrics: P_1=0.646 P_10=0.496 P_20=0.426 P_5=0.562 judged_10=0.996 judged_20=0.993 judged_200=0.947 map=0.259 ndcg_cut_10=0.516 ndcg_cut_20=0.490 ndcg_cut_5=0.549 recall_100=0.453 recall_1000=0.453 recip_rank=0.780
INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 dev metrics: P_1=0.583 P_10=0.510 P_20=0.447 P_5=0.563 judged_10=1.000 judged_20=0.995 judged_200=0.963 map=0.275 ndcg_cut_10=0.488 ndcg_cut_20=0.476 ndcg_cut_5=0.502 recall_100=0.557 recall_1000=0.557 recip_rank=0.712
2020-10-20 21:52:05,053 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 test metrics: P_1=0.633 P_10=0.437 P_20=0.367 P_5=0.535 judged_10=0.980 judged_20=0.978 judged_200=0.935 map=0.209 ndcg_cut_10=0.488 ndcg_cut_20=0.453 ndcg_cut_5=0.544 recall_100=0.391 recall_1000=0.391 recip_rank=0.724
INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 dev metrics: P_1=0.796 P_10=0.543 P_20=0.463 P_5=0.633 judged_10=0.996 judged_20=0.994 judged_200=0.949 map=0.279 ndcg_cut_10=0.577 ndcg_cut_20=0.533 ndcg_cut_5=0.638 recall_100=0.493 recall_1000=0.493 recip_rank=0.856
2020-10-20 23:02:43,400 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 test metrics: P_1=0.667 P_10=0.554 P_20=0.470 P_5=0.596 judged_10=0.998 judged_20=0.995 judged_200=0.963 map=0.299 ndcg_cut_10=0.537 ndcg_cut_20=0.515 ndcg_cut_5=0.546 recall_100=0.557 recall_1000=0.557 recip_rank=0.763
2020-10-21 00:11:49,032 - INFO - capreolus.task.rerank.evaluate -                       P_1: 0.6473
2020-10-21 00:11:49,032 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.4971
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4222
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                       P_5: 0.5685
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                 judged_10: 0.9921
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9880
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.9450
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -                       map: 0.2622
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5139
2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.4869
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -                ndcg_cut_5: 0.5497
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -                recall_100: 0.4765
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.4765
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.7590
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.35000000000000003, 0.4, 0.4, 0.55, 0.7000000000000001]
2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate -              P_1 [interp]: 0.6386
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -             P_10 [interp]: 0.5060
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4273
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -              P_5 [interp]: 0.5679
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -        judged_10 [interp]: 0.9896
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9847
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -       judged_200 [interp]: 0.8509
2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3227
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_10 [interp]: 0.5150
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.4920
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -       ndcg_cut_5 [interp]: 0.5412
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -       recall_100 [interp]: 0.4612
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.7293
andrewyates commented 4 years ago

Thanks! These are as expected for config_parade_small. (The full model is more effective, but also more memory intensive and harder to run. It probably requires CC with 2 GPUs.)