Closed crystina-z closed 4 years ago
This pull request introduces 1 alert when merging 87222cdf3339e06d5132f84a4210a8807d1763a3 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com
new alerts:
This pull request introduces 1 alert when merging 6df0cdd6c6a585417796742ab62c6b0f47595901 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com
new alerts:
This pull request introduces 1 alert when merging b68af94b41a7917c1c5d2e1dc0017c582eff4374 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com
new alerts:
result of running python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade_small.txt fold=s1/s2/s3/s4/s5
2020-11-05 02:30:48,055 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 test metrics:
P_1=0.532 P_10=0.466 P_20=0.423 P_5=0.523 judged_10=0.994 judged_20=0.989 judged_200=0.931 map=0.278 ndcg_cut_10=0.466 ndcg_cut_20=0.474 ndcg_cut_5=0.489 recall_100=0.490 recall_1000=0.490 recip_rank=0.697
2020-11-05 02:32:07,431 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 test metrics:
P_1=0.646 P_10=0.504 P_20=0.434 P_5=0.583 judged_10=0.998 judged_20=0.995 judged_200=0.947 map=0.266 ndcg_cut_10=0.522 ndcg_cut_20=0.500 ndcg_cut_5=0.560 recall_100=0.453 recall_1000=0.453 recip_rank=0.759
2020-11-05 02:33:37,806 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 test metrics:
P_1=0.633 P_10=0.437 P_20=0.360 P_5=0.510 judged_10=0.988 judged_20=0.981 judged_200=0.935 map=0.206 ndcg_cut_10=0.481 ndcg_cut_20=0.445 ndcg_cut_5=0.521 recall_100=0.391 recall_1000=0.391 recip_rank=0.735
2020-11-05 02:34:43,672 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 test metrics:
P_1=0.771 P_10=0.529 P_20=0.446 P_5=0.588 judged_10=1.000 judged_20=0.994 judged_200=0.963 map=0.293 ndcg_cut_10=0.529 ndcg_cut_20=0.501 ndcg_cut_5=0.554 recall_100=0.557 recall_1000=0.557 recip_rank=0.832
2020-11-05 02:11:47,203 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s5 test metrics:
P_1=0.714 P_10=0.535 P_20=0.446 P_5=0.616 judged_10=0.996 judged_20=0.994 judged_200=0.949 map=0.281 ndcg_cut_10=0.567 ndcg_cut_20=0.518 ndcg_cut_5=0.623 recall_100=0.493 recall_1000=0.493 recip_rank=0.810
accidentally deleted the previous output file, this is the cross-validate results of another 5 runs:
2020-11-05 03:20:20,714 - INFO - capreolus.task.rerank.evaluate - P_1: 0.6639
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - P_10: 0.5071
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4303
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - P_5: 0.5768
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - judged_10: 0.9950
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9907
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - judged_200: 0.9450
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - map: 0.2709
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10: 0.5236
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.4967
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_5: 0.5586
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - recall_100: 0.4765
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.4765
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.7684
This pull request introduces 3 alerts when merging 9d1a124db009f698da5543a5f33ae492070542e8 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com
new alerts:
This pull request introduces 1 alert when merging 1a2331f77ea3c6cb608713a42ddeb0ecc09fd7ce into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com
new alerts:
current cross-validated score with parameters in docs/reproduction/config_parade.txt
2020-11-08 19:49:46,025 - INFO - capreolus.task.rerank.evaluate - P_1: 0.7386
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate - P_10: 0.5801
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4811
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate - P_5: 0.6448
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate - judged_10: 0.9946
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate - judged_20: 0.9890
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate - judged_200: 0.9450
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - map: 0.3104
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10: 0.5985
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.5600
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_5: 0.6289
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - recall_100: 0.4765
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.4765
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - recip_rank: 0.8185
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.0, 0.0, 0.2, 0.25, 0.25]
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - P_1 [interp]: 0.6948
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - P_10 [interp]: 0.5651
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - P_20 [interp]: 0.4697
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - P_5 [interp]: 0.6378
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - judged_10 [interp]: 0.9855
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - judged_20 [interp]: 0.9787
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - judged_200 [interp]: 0.7729
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - map [interp]: 0.3461
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10 [interp]: 0.5805
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20 [interp]: 0.5457
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_5 [interp]: 0.6134
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - recall_100 [interp]: 0.4662
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - recall_1000 [interp]: 0.7761
2020-11-08 19:49:46,031 - INFO - capreolus.task.rerank.evaluate - recip_rank [interp]: 0.7838
major changes:
tokenizer/BertPassage
,reranker/TFBERTMaxP
encode
since that will require passage be prepared before being tokenized (so that it can be passed toencode()
with query), while we wanna split the passage after tokenizationtodo:
roberta
segment id is handled inreranker/TFBERTMaxP
(which only has 0 as seg_id but not 1), it's done so since otherwiseextractor
will need to read the config from thetokenizer
. the drawback is ofc we need to rewrite this logic in all BERT models to support roberta...not sure which way is better