Open d1shs0ap opened 2 years ago
Hi @d1shs0ap, thanks for helping to replicate. The link to the two config file seems to be broken, would u mind paste them into the issue? Also which commit are we using? Thanks!
Hey @crystina-z, updated the config screenshots. The commit that I ran the experiments on is e10928f
. Thank you!
Hi @crystina-z, any updates on this issue?
hi @d1shs0ap sorry for the waiting, it took a long while for me to realize that it's missing one line to specify the decay
rate in the config file - appending the reranker.trainer.decay=0.1
to the end of config should gives MRR@10 0.35+. I'll update it in the next PR. lmk if the issue is still there after adding this.
Thanks again for pointing this issue out!
Ok great thanks, I'll test it out now!
Hey @crystina-z the experiment just finished, here are the results:
MRR@10=0.293
MRR@10=0.347
e9cf9a6
Should I run the experiment again, with the latest commits?
hi @d1shs0ap that would be nice. tho before that could u share the config file and command you used to run the scripts, just in case I missed anything there.
@crystina-z Here's the config file:
optimize=MRR@10
threshold=100
testthreshold=1
benchmark.name=msmarcopsg
rank.searcher.name=msmarcopsgbm25
reranker.name=TFBERTMaxP
reranker.pretrained=bert-base-uncased
reranker.extractor.usecache=True
reranker.extractor.numpassages=1
reranker.extractor.maxseqlen=512
reranker.extractor.maxqlen=50
reranker.extractor.tokenizer.pretrained=bert-base-uncased
reranker.trainer.usecache=True
reranker.trainer.niters=1
reranker.trainer.batch=4
reranker.trainer.evalbatch=256
reranker.trainer.itersize=48000
reranker.trainer.warmupiters=1
reranker.trainer.decay=0.1
reranker.trainer.decayiters=1
reranker.trainer.decaytype=linear
reranker.trainer.loss=pairwise_hinge_loss
I first ran
ENVDIR=$HOME/venv/capreolus-env
source $ENVDIR/bin/activate
module load java/11
module load python/3.7
module load scipy-stack
in the terminal, then sbatch docs/reproduction/sample_slurm_script.sh
, which is the following:
#!/bin/bash
#SBATCH --job-name=msmarcopsg
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100l:4
#SBATCH --ntasks-per-node=1
#SBATCH --mem=0
#SBATCH --time=48:00:00
#SBATCH --account=$SLURM_ACCOUNT
#SBATCH --cpus-per-task=32
#SBATCH -o ./msmarco-psg-output.log
niters=10
batch_size=16
validatefreq=$niters # to ensure the validation is run only at the end of training
decayiters=$niters # either same with $itersize or 0
threshold=1000 # the top-k documents to rerank
python -m capreolus.run rerank.train with \
file=docs/reproduction/config_msmarco.txt \
threshold=$threshold \
reranker.trainer.niters=$niters \
reranker.trainer.batch=$batch_size \
reranker.trainer.decayiters=$decayiters \
reranker.trainer.validatefreq=$validatefreq \
fold=s1
I should also mention that this is ran on the forked repository nimasadri11/capreolus
. Thanks!
@crystina-z Retrained with latest changes and got MRR@10=0.351. However, I ran this experiment on the nimasadri11 fork. Should I add a pull request on that fork? (Currently waiting for the experiment results for the original repo)
@d1shs0ap thanks for the update! yea for this issue let's wait for the result on the original repo for now? feel free to add another PR to nima's fork as well. thanks!
@crystina-z The latest MRR I got after running on the original repo is 0.3496
, is that good enough? Here's the output:
[2;37m2022-01-26 13:41:35,370 - [0m[32mINFO - capreolus.trainer.tensorflow.train - dev metrics: MRR@10=0.350 P_1=0.230 P_10=0.064 P_20=0.036 P_5=0.105 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.354 ndcg_cut_10=0.410 ndcg_cut_20=0.431 ndcg_cut_5=0.375 recall_100=0.814 recall_1000=0.853 recip_rank=0.359[0m
[2;37m2022-01-26 13:41:35,399 - [0m[32mINFO - capreolus.trainer.tensorflow.train - new best dev metric: 0.3496[0m
@d1shs0ap the score still looks a bit lowish to me tho. maybe let's PR the record to nima's branch and I'll check the score here.
Could u please share your version of the transformers
and all tensorflow related packages? Thanks so much!
@crystina-z Hey I made the PR to Nima's branch, below are my package versions:
tensorboard==2.7.0
tensorboard-data-server==0.6.1+computecanada
tensorboard-plugin-wit==1.8.0+computecanada
tensorflow==2.4.1+computecanada
tensorflow-addons==0.13.0+computecanada
tensorflow-datasets==4.4.0
tensorflow-estimator==2.4.0+computecanada
tensorflow-hub==0.12.0+computecanada
tensorflow-io-gcs-filesystem==0.22.0+computecanada
tensorflow-metadata==1.5.0
tensorflow-model-optimization==0.7.0
tensorflow-ranking==0.4.2
tensorflow-serving-api==2.7.0
tf-models-official==2.5.0
tf-slim==1.1.0
and
transformers==4.6.0
Thanks!
This is the task I replicated: https://github.com/capreolus-ir/capreolus/blob/feature/msmarco_psg/docs/reproduction/MS_MARCO.md, by following
docs/reproduction/sample_slurm_script.sh
.Findings
"Mini" version
The task did not finish with the recommended time and compute settings, i.e. the following configs:
After trying these configs(entire node),
MRR@10=0.283
was achieved, slightly below the 0.295 given in the docs (finished in 21h)"Full" version
MRR@10=0.346
was achieved as opposed to the expectedMRR@10=0.35+
, with the following configs(entire node): (finished in 42h)