Closed stephaniewhoo closed 4 years ago
I replicated on Colab GPU with numpassage=8,
Here is my result.
INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 dev metrics: P_1=0.688 P_10=0.500 P_20=0.427 P_5=0.554 judged_10=1.000 judged_20=0.992 judged_200=0.947 map=0.254 ndcg_cut_10=0.516 ndcg_cut_20=0.491 ndcg_cut_5=0.546 recall_100=0.453 recall_1000=0.453 recip_rank=0.802 2020-10-20 18:21:08,984 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 test metrics: P_1=0.574 P_10=0.494 P_20=0.417 P_5=0.562 judged_10=0.989 judged_20=0.982 judged_200=0.931 map=0.282 ndcg_cut_10=0.497 ndcg_cut_20=0.484 ndcg_cut_5=0.523 recall_100=0.490 recall_1000=0.490 recip_rank=0.721 INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 dev metrics: P_1=0.653 P_10=0.459 P_20=0.386 P_5=0.551 judged_10=0.988 judged_20=0.974 judged_200=0.935 map=0.218 ndcg_cut_10=0.511 ndcg_cut_20=0.473 ndcg_cut_5=0.559 recall_100=0.391 recall_1000=0.391 recip_rank=0.752 2020-10-20 19:37:53,818 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 test metrics: P_1=0.646 P_10=0.496 P_20=0.426 P_5=0.562 judged_10=0.996 judged_20=0.993 judged_200=0.947 map=0.259 ndcg_cut_10=0.516 ndcg_cut_20=0.490 ndcg_cut_5=0.549 recall_100=0.453 recall_1000=0.453 recip_rank=0.780 INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 dev metrics: P_1=0.583 P_10=0.510 P_20=0.447 P_5=0.563 judged_10=1.000 judged_20=0.995 judged_200=0.963 map=0.275 ndcg_cut_10=0.488 ndcg_cut_20=0.476 ndcg_cut_5=0.502 recall_100=0.557 recall_1000=0.557 recip_rank=0.712 2020-10-20 21:52:05,053 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 test metrics: P_1=0.633 P_10=0.437 P_20=0.367 P_5=0.535 judged_10=0.980 judged_20=0.978 judged_200=0.935 map=0.209 ndcg_cut_10=0.488 ndcg_cut_20=0.453 ndcg_cut_5=0.544 recall_100=0.391 recall_1000=0.391 recip_rank=0.724 INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 dev metrics: P_1=0.796 P_10=0.543 P_20=0.463 P_5=0.633 judged_10=0.996 judged_20=0.994 judged_200=0.949 map=0.279 ndcg_cut_10=0.577 ndcg_cut_20=0.533 ndcg_cut_5=0.638 recall_100=0.493 recall_1000=0.493 recip_rank=0.856 2020-10-20 23:02:43,400 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 test metrics: P_1=0.667 P_10=0.554 P_20=0.470 P_5=0.596 judged_10=0.998 judged_20=0.995 judged_200=0.963 map=0.299 ndcg_cut_10=0.537 ndcg_cut_20=0.515 ndcg_cut_5=0.546 recall_100=0.557 recall_1000=0.557 recip_rank=0.763
2020-10-21 00:11:49,032 - INFO - capreolus.task.rerank.evaluate - P_1: 0.6473 2020-10-21 00:11:49,032 - INFO - capreolus.task.rerank.evaluate - P_10: 0.4971 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4222 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - P_5: 0.5685 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - judged_10: 0.9921 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9880 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - judged_200: 0.9450 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - map: 0.2622 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10: 0.5139 2020-10-21 00:11:49,033 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.4869 2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_5: 0.5497 2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - recall_100: 0.4765 2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.4765 2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.7590 2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.35000000000000003, 0.4, 0.4, 0.55, 0.7000000000000001] 2020-10-21 00:11:49,034 - INFO - capreolus.task.rerank.evaluate - P_1 [interp]: 0.6386 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - P_10 [interp]: 0.5060 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - P_20 [interp]: 0.4273 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - P_5 [interp]: 0.5679 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - judged_10 [interp]: 0.9896 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - judged_20 [interp]: 0.9847 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - judged_200 [interp]: 0.8509 2020-10-21 00:11:49,035 - INFO - capreolus.task.rerank.evaluate - map [interp]: 0.3227 2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10 [interp]: 0.5150 2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20 [interp]: 0.4920 2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_5 [interp]: 0.5412 2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate - recall_100 [interp]: 0.4612 2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate - recall_1000 [interp]: 0.7761 2020-10-21 00:11:49,036 - INFO - capreolus.task.rerank.evaluate - recip_rank [interp]: 0.7293
Thanks! These are as expected for config_parade_small. (The full model is more effective, but also more memory intensive and harder to run. It probably requires CC with 2 GPUs.)
I replicated on Colab GPU with numpassage=8,
Here is my result.