Feature/msmarco psg - Githubissues

crystina-z commented 3 years ago

Just sending this PR to better track the progress :) Don't worry about it now

Now running ms marco psg while only reranking the top100 data looks right.

Confusing stuff to solve

[x] So far none of the runs on reranking top1k could save the checkpoint even tho it seems finished in some cases (but it can be saved in reranking the top100 case). While i was suspecting this is due to the running time limit in slurm, yet everytime queueing up a 4 days runs some other weird bug can pop up so after so many days it is still unconfirmed...
[x] The most recent run throws this error, while the sampler.generate_example and sampler.get_preds_in_trec_format seem to align with each other, and the dev records are prepared in this run so it's not because of overdue cache data. Still checking what's happening here. (Again this does not happen for the reranking top100 case.)

Traceback (most recent call last):
  File "run.py", line 108, in <module>
    task_entry_function()
  File "/home/czhang/aaai2021/tmp/capreolus/capreolus/task/rerank.py", line 54, in train
    return self.rerank_run(best_search_run, self.get_results_path())
  File "/home/czhang/aaai2021/tmp/capreolus/capreolus/task/rerank.py", line 101, in rerank_run
    self.benchmark.relevance_level,
  File "/home/czhang/aaai2021/tmp/capreolus/capreolus/trainer/tensorflow.py", line 205, in train
    trec_preds = self.get_preds_in_trec_format(dev_predictions, dev_data)
  File "/home/czhang/aaai2021/tmp/capreolus/capreolus/trainer/tensorflow.py", line 429, in get_preds_in_trec_format
    pred_dict[qid][docid] = predictions[i].numpy().astype(np.float16).item()
IndexError: list index out of range

Features/Support to add

[x] right now I have to hack the code to let the ranker only do evaluation on dev qids, otherwise the evaluation there (on millions of training queries!) gonna block the process forever. Probably this is already solved in feature/fit branch where only test qids are evaluated?
[ ] use sqlite to prepare and store the docid2passage, so this can be run on the RAM-poor machines
[ ] ~(maybe less urgent) handle the msmarco downloading using the allenai/ir_datasets~

Sidenote (about the running time of some operation)

evaluation (on only dev set, not including the training queries) 1.1 pytrec_eval: 40 sec 1.2 (trec_eval: 4 secs)
Loading the whole runfile (including training data): ~310s (Saercher.load_trec_run())
Preparing the passage: ?
Prepare training runs: 3.5 hours
tfrecords (train + dev): 2 hours
training: 6.1 (3k iteration) < 2 hours 6.2 (30k iter) 11~12 hours
inference 7.1 (top100): several hours 7.2 (top1k) > 1.5 days

lgtm-com[bot] commented 3 years ago

This pull request introduces 3 alerts when merging 649804e2b0e99764f3d8f23f77d0cecdc323bf90 into bf5042354bd654cfd1f93f534c62e4a170048ee2 - view on LGTM.com

new alerts:

3 for Unused import

crystina-z commented 3 years ago

main changes in the above commits:

documents: add MS_MARCO.md for replicating the results on MS MARCO Passage dataset, and setup-cc.md for setting up Capreolus on Compute Canada.
code: (1) bugfixes for msmarco_keyword benchmark; (2) add official MS MARCO MRR@10 eval to evaluation.py and DEFAULT_METRICS changes regarding to the main framework (3) dev tf record preparation: (3.1) before the final dev data less than batchsize will get dropped (commit e667ef0 -> as mentioned in #118 ) (3.2) before the dev tf record was read in random order, results in that the order of predictions are different with trainer.generate_qid_docid_pair (commit 19f36a1) (4) each benchmark's dev set for searcher are determined by a new property function non_nn_dev(), which may or may not include the train_qids in each fold (controlled by each benchmark's use_train_as_dev) (commit ddd199) (5) the algorithm of tensorflow trainer's linear decay is changed to be aligned with tf.train.polynomial (commit 7911a1) (6) the decay and warmup are now applied per iteration rather than per epoch (commit 7911a1)

andrewyates commented 3 years ago

This PR is a good time to do some cleanup with epochs and the LR schedule:

Let's match pytorch's behavior of decoupling the size of an epoch (itersize) from the batch size? I don't see any advantage to making epoch=itersize*batch as we currently have with TF, and it means itersize has to be modified when the batch size changes
Let's do the same for warmupsteps and decaysteps (& rename them?), so that they're independent of the size of an epoch?

crystina-z commented 3 years ago

score of tf-parade at this point (with tf=2.3)

python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt reranker.trainer.amp=True

2021-07-07 14:29:36,880 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.5590
2021-07-07 14:29:36,880 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4847
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.7130
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate -                       map: 0.3697
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5780
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.5575
2021-07-07 14:29:36,881 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.7761

lgtm-com[bot] commented 3 years ago

This pull request introduces 14 alerts when merging e6bb26f327854fa5d3861cd06457c3a2d2556c95 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

8 for Unused import
6 for Unused local variable

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 36de0a7ca687d598b2d78097110b0ee15328a456 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

crystina-z commented 3 years ago

msmarco psg score at this point MRR@10=0.354

python -m capreolus.run rerank.train with file=docs/reproduction/config_msmarco.txt reranker.trainer.amp=True

and the validation score on dev set:

niter=2: dev metrics: MRR@10=0.328 P_1=0.213 P_10=0.061 P_20=0.035 P_5=0.099 judged_10=0.061 judged_20=0.035 judged_200=0.004 map=0.333 ndcg_cut_10=0.387 ndcg_cut_20=0.410 ndcg_cut_5=0.352 recall_100=0.806 recall_1000=0.853 recip_rank=0.338
niter=4: dev metrics: MRR@10=0.337 P_1=0.218 P_10=0.062 P_20=0.036 P_5=0.103 judged_10=0.062 judged_20=0.036 judged_200=0.004 map=0.341 ndcg_cut_10=0.396 ndcg_cut_20=0.418 ndcg_cut_5=0.362 recall_100=0.812 recall_1000=0.853 recip_rank=0.346
niter=6: dev metrics: MRR@10=0.347 P_1=0.229 P_10=0.064 P_20=0.036 P_5=0.104 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.351 ndcg_cut_10=0.407 ndcg_cut_20=0.428 ndcg_cut_5=0.372 recall_100=0.814 recall_1000=0.853 recip_rank=0.356
niter=8: dev metrics: MRR@10=0.343 P_1=0.222 P_10=0.064 P_20=0.036 P_5=0.105 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.347 ndcg_cut_10=0.406 ndcg_cut_20=0.426 ndcg_cut_5=0.369 recall_100=0.813 recall_1000=0.853 recip_rank=0.352
niter=10: dev metrics: MRR@10=0.354 P_1=0.237 P_10=0.064 P_20=0.037 P_5=0.106 judged_10=0.064 judged_20=0.037 judged_200=0.004 map=0.359 ndcg_cut_10=0.414 ndcg_cut_20=0.435 ndcg_cut_5=0.380 recall_100=0.815 recall_1000=0.853 recip_rank=0.364

score of tf-parade at this point:

python -m capreolus.run rerank.traineval with file=docs/reproduction/config_parade.txt reranker.trainer.amp=True

2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -                      P_10: 0.5631
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -                      P_20: 0.4813
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -                judged_200: 0.7143
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -                       map: 0.3717
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_10: 0.5828
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -               ndcg_cut_20: 0.5565
2021-07-12 17:35:35,210 - INFO - capreolus.task.rerank.evaluate -               recall_1000: 0.7761

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 7739a4610383e466a86abca79abbcd96dcd63b64 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 7a91716aec062ba02513b07d9e859a95481fab57 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 08bde912e7ae59750258743f9bc395037c3235fc into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

crystina-z commented 3 years ago

added error description (issue #161 ) and some format checking (issue #157)

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 31cc183a94f8be3b14f443031570968da3247dd7 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 28dc8a55eb2b0e4ba066aded37bb51095b40dbe2 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 2 alerts when merging 7664df9662df0411bd10470ba0649a2a268bf753 into 18e31b7d7baee8be1da36fe887d9b9c9aba54a74 - view on LGTM.com

new alerts:

2 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging 569276e048c033d1c39fe0499fabf82b6140beca into 1767d5ab6f940ca3f329e88fd5899608744320c0 - view on LGTM.com

new alerts:

1 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging 1bbf0f295b09774e2fb2a1db7dfddef88adec7be into 1767d5ab6f940ca3f329e88fd5899608744320c0 - view on LGTM.com

new alerts:

1 for Unused import

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging bbe134e887b920c2cc5452264538171143d8e492 into 1767d5ab6f940ca3f329e88fd5899608744320c0 - view on LGTM.com

new alerts:

1 for Unused import

capreolus-ir / capreolus

Feature/msmarco psg #117

Just sending this PR to better track the progress :) Don't worry about it now