capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

Feature/autotokenizer #112

Closed crystina-z closed 4 years ago

crystina-z commented 4 years ago

major changes:

  1. support electra, roberta and albert from huggingface for tokenizer/BertPassage, reranker/TFBERTMaxP
  2. fixed the bug of the long query and tokenizing after passage split inside BertPassage (issue #105); didn't end up using encode since that will require passage be prepared before being tokenized (so that it can be passed to encode() with query), while we wanna split the passage after tokenization

todo:

  1. add BertPassageMixin so that we don't need to write the data preparation logics in twice (BertPassage and PooledBertPassage)
  2. now the special case of roberta segment id is handled in reranker/TFBERTMaxP (which only has 0 as seg_id but not 1), it's done so since otherwise extractor will need to read the config from the tokenizer. the drawback is ofc we need to rewrite this logic in all BERT models to support roberta...not sure which way is better
lgtm-com[bot] commented 4 years ago

This pull request introduces 1 alert when merging 87222cdf3339e06d5132f84a4210a8807d1763a3 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com

new alerts:

lgtm-com[bot] commented 4 years ago

This pull request introduces 1 alert when merging 6df0cdd6c6a585417796742ab62c6b0f47595901 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com

new alerts:

lgtm-com[bot] commented 4 years ago

This pull request introduces 1 alert when merging b68af94b41a7917c1c5d2e1dc0017c582eff4374 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com

new alerts:

crystina-z commented 4 years ago

result of running python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade_small.txt fold=s1/s2/s3/s4/s5

2020-11-05 02:30:48,055 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s1 test metrics: 
   P_1=0.532 P_10=0.466 P_20=0.423 P_5=0.523 judged_10=0.994 judged_20=0.989 judged_200=0.931 map=0.278 ndcg_cut_10=0.466 ndcg_cut_20=0.474 ndcg_cut_5=0.489 recall_100=0.490 recall_1000=0.490 recip_rank=0.697

2020-11-05 02:32:07,431 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s2 test metrics: 
   P_1=0.646 P_10=0.504 P_20=0.434 P_5=0.583 judged_10=0.998 judged_20=0.995 judged_200=0.947 map=0.266 ndcg_cut_10=0.522 ndcg_cut_20=0.500 ndcg_cut_5=0.560 recall_100=0.453 recall_1000=0.453 recip_rank=0.759

2020-11-05 02:33:37,806 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s3 test metrics: 
   P_1=0.633 P_10=0.437 P_20=0.360 P_5=0.510 judged_10=0.988 judged_20=0.981 judged_200=0.935 map=0.206 ndcg_cut_10=0.481 ndcg_cut_20=0.445 ndcg_cut_5=0.521 recall_100=0.391 recall_1000=0.391 recip_rank=0.735

2020-11-05 02:34:43,672 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s4 test metrics: 
   P_1=0.771 P_10=0.529 P_20=0.446 P_5=0.588 judged_10=1.000 judged_20=0.994 judged_200=0.963 map=0.293 ndcg_cut_10=0.529 ndcg_cut_20=0.501 ndcg_cut_5=0.554 recall_100=0.557 recall_1000=0.557 recip_rank=0.832

2020-11-05 02:11:47,203 - INFO - capreolus.task.rerank.evaluate - rerank: fold=s5 test metrics: 
   P_1=0.714 P_10=0.535 P_20=0.446 P_5=0.616 judged_10=0.996 judged_20=0.994 judged_200=0.949 map=0.281 ndcg_cut_10=0.567 ndcg_cut_20=0.518 ndcg_cut_5=0.623 recall_100=0.493 recall_1000=0.493 recip_rank=0.810
crystina-z commented 4 years ago

accidentally deleted the previous output file, this is the cross-validate results of another 5 runs:

2020-11-05 03:20:20,714 - INFO - capreolus.task.rerank.evaluate -                       P_1: 0.6639
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.5071
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4303
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                       P_5: 0.5768
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                 judged_10: 0.9950
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9907
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.9450
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                       map: 0.2709
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5236
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.4967
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                ndcg_cut_5: 0.5586
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                recall_100: 0.4765
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.4765
2020-11-05 03:20:20,715 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.7684
lgtm-com[bot] commented 4 years ago

This pull request introduces 3 alerts when merging 9d1a124db009f698da5543a5f33ae492070542e8 into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com

new alerts:

lgtm-com[bot] commented 4 years ago

This pull request introduces 1 alert when merging 1a2331f77ea3c6cb608713a42ddeb0ecc09fd7ce into 1237c04fda769315d7f3a0ccd050d82d576d851c - view on LGTM.com

new alerts:

crystina-z commented 4 years ago

current cross-validated score with parameters in docs/reproduction/config_parade.txt

2020-11-08 19:49:46,025 - INFO - capreolus.task.rerank.evaluate -                       P_1: 0.7386
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.5801
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4811
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate -                       P_5: 0.6448
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate -                 judged_10: 0.9946
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate -                 judged_20: 0.9890
2020-11-08 19:49:46,029 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.9450
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -                       map: 0.3104
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5985
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.5600
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -                ndcg_cut_5: 0.6289
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -                recall_100: 0.4765
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.4765
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -                recip_rank: 0.8185
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate - interpolated with alphas = [0.0, 0.0, 0.2, 0.25, 0.25]
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -              P_1 [interp]: 0.6948
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -             P_10 [interp]: 0.5651
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -             P_20 [interp]: 0.4697
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -              P_5 [interp]: 0.6378
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -        judged_10 [interp]: 0.9855
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -        judged_20 [interp]: 0.9787
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -       judged_200 [interp]: 0.7729
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -              map [interp]: 0.3461
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_10 [interp]: 0.5805
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -      ndcg_cut_20 [interp]: 0.5457
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -       ndcg_cut_5 [interp]: 0.6134
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -       recall_100 [interp]: 0.4662
2020-11-08 19:49:46,030 - INFO - capreolus.task.rerank.evaluate -      recall_1000 [interp]: 0.7761
2020-11-08 19:49:46,031 - INFO - capreolus.task.rerank.evaluate -       recip_rank [interp]: 0.7838