Closed crystina-z closed 3 years ago
This pull request introduces 3 alerts when merging 649804e2b0e99764f3d8f23f77d0cecdc323bf90 into bf5042354bd654cfd1f93f534c62e4a170048ee2 - view on LGTM.com
new alerts:
main changes in the above commits:
MS_MARCO.md
for replicating the results on MS MARCO Passage dataset, and setup-cc.md
for setting up Capreolus on Compute Canada.evaluation.py
and DEFAULT_METRICS
changes regarding to the main framework
(3) dev tf record preparation:
(3.1) before the final dev data less than batchsize will get dropped (commit e667ef0 -> as mentioned in #118 )
(3.2) before the dev tf record was read in random order, results in that the order of predictions
are different with trainer.generate_qid_docid_pair
(commit 19f36a1)
(4) each benchmark's dev set for searcher
are determined by a new property function non_nn_dev()
, which may or may not include the train_qids
in each fold (controlled by each benchmark's use_train_as_dev
) (commit ddd199)
(5) the algorithm of tensorflow trainer's linear decay is changed to be aligned with tf.train.polynomial (commit 7911a1)
(6) the decay and warmup are now applied per iteration rather than per epoch (commit 7911a1)This PR is a good time to do some cleanup with epochs and the LR schedule:
itersize
) from the batch size? I don't see any advantage to making epoch=itersize*batch
as we currently have with TF, and it means itersize
has to be modified when the batch size changeswarmupsteps
and decaysteps
(& rename them?), so that they're independent of the size of an epoch?score of tf-parade at this point (with tf=2.3)
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt reranker.trainer.amp=True
2021-07-07 14:29:36,880 - INFO - capreolus.task.rerank.evaluate - P_10: 0.5590
2021-07-07 14:29:36,880 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4847
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate - judged_200: 0.7130
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate - map: 0.3697
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10: 0.5780
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.5575
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.7761
This pull request introduces 14 alerts when merging e6bb26f327854fa5d3861cd06457c3a2d2556c95 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
This pull request introduces 2 alerts when merging 36de0a7ca687d598b2d78097110b0ee15328a456 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
msmarco psg score at this point MRR@10=0.354
python -m capreolus.run rerank.train with file=docs/reproduction/config_msmarco.txt reranker.trainer.amp=True
and the validation score on dev set:
niter=2: dev metrics: MRR@10=0.328 P_1=0.213 P_10=0.061 P_20=0.035 P_5=0.099 judged_10=0.061 judged_20=0.035 judged_200=0.004 map=0.333 ndcg_cut_10=0.387 ndcg_cut_20=0.410 ndcg_cut_5=0.352 recall_100=0.806 recall_1000=0.853 recip_rank=0.338
niter=4: dev metrics: MRR@10=0.337 P_1=0.218 P_10=0.062 P_20=0.036 P_5=0.103 judged_10=0.062 judged_20=0.036 judged_200=0.004 map=0.341 ndcg_cut_10=0.396 ndcg_cut_20=0.418 ndcg_cut_5=0.362 recall_100=0.812 recall_1000=0.853 recip_rank=0.346
niter=6: dev metrics: MRR@10=0.347 P_1=0.229 P_10=0.064 P_20=0.036 P_5=0.104 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.351 ndcg_cut_10=0.407 ndcg_cut_20=0.428 ndcg_cut_5=0.372 recall_100=0.814 recall_1000=0.853 recip_rank=0.356
niter=8: dev metrics: MRR@10=0.343 P_1=0.222 P_10=0.064 P_20=0.036 P_5=0.105 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.347 ndcg_cut_10=0.406 ndcg_cut_20=0.426 ndcg_cut_5=0.369 recall_100=0.813 recall_1000=0.853 recip_rank=0.352
niter=10: dev metrics: MRR@10=0.354 P_1=0.237 P_10=0.064 P_20=0.037 P_5=0.106 judged_10=0.064 judged_20=0.037 judged_200=0.004 map=0.359 ndcg_cut_10=0.414 ndcg_cut_20=0.435 ndcg_cut_5=0.380 recall_100=0.815 recall_1000=0.853 recip_rank=0.364
score of tf-parade at this point:
python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt reranker.trainer.amp=True
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - P_10: 0.5631
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - P_20: 0.4813
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - judged_200: 0.7143
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - map: 0.3717
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_10: 0.5828
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - ndcg_cut_20: 0.5565
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate - recall_1000: 0.7761
This pull request introduces 2 alerts when merging 7739a4610383e466a86abca79abbcd96dcd63b64 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
This pull request introduces 2 alerts when merging 7a91716aec062ba02513b07d9e859a95481fab57 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
This pull request introduces 2 alerts when merging 08bde912e7ae59750258743f9bc395037c3235fc into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
added error description (issue #161 ) and some format checking (issue #157)
This pull request introduces 2 alerts when merging 31cc183a94f8be3b14f443031570968da3247dd7 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
This pull request introduces 2 alerts when merging 28dc8a55eb2b0e4ba066aded37bb51095b40dbe2 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
This pull request introduces 2 alerts when merging 7664df9662df0411bd10470ba0649a2a268bf753 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com
new alerts:
This pull request introduces 1 alert when merging 569276e048c033d1c39fe0499fabf82b6140beca into 1767d5ab6f940ca3f329e88fd5899608744320c0 - view on LGTM.com
new alerts:
This pull request introduces 1 alert when merging 1bbf0f295b09774e2fb2a1db7dfddef88adec7be into 1767d5ab6f940ca3f329e88fd5899608744320c0 - view on LGTM.com
new alerts:
This pull request introduces 1 alert when merging bbe134e887b920c2cc5452264538171143d8e492 into 1767d5ab6f940ca3f329e88fd5899608744320c0 - view on LGTM.com
new alerts:
Just sending this PR to better track the progress :) Don't worry about it now
Now running ms marco psg while only reranking the top100 data looks right.
Confusing stuff to solve
sampler.generate_example
andsampler.get_preds_in_trec_format
seem to align with each other, and the dev records are prepared in this run so it's not because of overdue cache data. Still checking what's happening here. (Again this does not happen for the reranking top100 case.)Features/Support to add
Sidenote (about the running time of some operation)
Saercher.load_trec_run()
)