capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

MRR@10=0.35 not achieved on fine-tuning monoBERT task #200

Open d1shs0ap opened 2 years ago

d1shs0ap commented 2 years ago

This is the task I replicated: https://github.com/capreolus-ir/capreolus/blob/feature/msmarco_psg/docs/reproduction/MS_MARCO.md, by following docs/reproduction/sample_slurm_script.sh.

Findings

"Mini" version

"Full" version

crystina-z commented 2 years ago

Hi @d1shs0ap, thanks for helping to replicate. The link to the two config file seems to be broken, would u mind paste them into the issue? Also which commit are we using? Thanks!

d1shs0ap commented 2 years ago

Hey @crystina-z, updated the config screenshots. The commit that I ran the experiments on is e10928f. Thank you!

d1shs0ap commented 2 years ago

Hi @crystina-z, any updates on this issue?

crystina-z commented 2 years ago

hi @d1shs0ap sorry for the waiting, it took a long while for me to realize that it's missing one line to specify the decay rate in the config file - appending the reranker.trainer.decay=0.1 to the end of config should gives MRR@10 0.35+. I'll update it in the next PR. lmk if the issue is still there after adding this.

Thanks again for pointing this issue out!

d1shs0ap commented 2 years ago

Ok great thanks, I'll test it out now!

d1shs0ap commented 2 years ago

Hey @crystina-z the experiment just finished, here are the results:

Should I run the experiment again, with the latest commits?

crystina-z commented 2 years ago

hi @d1shs0ap that would be nice. tho before that could u share the config file and command you used to run the scripts, just in case I missed anything there.

d1shs0ap commented 2 years ago

@crystina-z Here's the config file:

optimize=MRR@10
threshold=100
testthreshold=1

benchmark.name=msmarcopsg
rank.searcher.name=msmarcopsgbm25

reranker.name=TFBERTMaxP
reranker.pretrained=bert-base-uncased

reranker.extractor.usecache=True
reranker.extractor.numpassages=1
reranker.extractor.maxseqlen=512
reranker.extractor.maxqlen=50
reranker.extractor.tokenizer.pretrained=bert-base-uncased

reranker.trainer.usecache=True
reranker.trainer.niters=1
reranker.trainer.batch=4
reranker.trainer.evalbatch=256
reranker.trainer.itersize=48000
reranker.trainer.warmupiters=1
reranker.trainer.decay=0.1
reranker.trainer.decayiters=1
reranker.trainer.decaytype=linear

reranker.trainer.loss=pairwise_hinge_loss

I first ran

ENVDIR=$HOME/venv/capreolus-env
source $ENVDIR/bin/activate
module load java/11
module load python/3.7
module load scipy-stack

in the terminal, then sbatch docs/reproduction/sample_slurm_script.sh, which is the following:

#!/bin/bash
#SBATCH --job-name=msmarcopsg
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100l:4
#SBATCH --ntasks-per-node=1
#SBATCH --mem=0
#SBATCH --time=48:00:00
#SBATCH --account=$SLURM_ACCOUNT
#SBATCH --cpus-per-task=32

#SBATCH -o ./msmarco-psg-output.log

niters=10
batch_size=16
validatefreq=$niters # to ensure the validation is run only at the end of training
decayiters=$niters   # either same with $itersize or 0
threshold=1000       # the top-k documents to rerank

python -m capreolus.run rerank.train with \
    file=docs/reproduction/config_msmarco.txt  \
    threshold=$threshold \
    reranker.trainer.niters=$niters \
    reranker.trainer.batch=$batch_size \
    reranker.trainer.decayiters=$decayiters \
    reranker.trainer.validatefreq=$validatefreq \
    fold=s1

I should also mention that this is ran on the forked repository nimasadri11/capreolus. Thanks!

d1shs0ap commented 2 years ago

@crystina-z Retrained with latest changes and got MRR@10=0.351. However, I ran this experiment on the nimasadri11 fork. Should I add a pull request on that fork? (Currently waiting for the experiment results for the original repo)

crystina-z commented 2 years ago

@d1shs0ap thanks for the update! yea for this issue let's wait for the result on the original repo for now? feel free to add another PR to nima's fork as well. thanks!

d1shs0ap commented 2 years ago

@crystina-z The latest MRR I got after running on the original repo is 0.3496, is that good enough? Here's the output:

2022-01-26 13:41:35,370 - INFO - capreolus.trainer.tensorflow.train - dev metrics: MRR@10=0.350 P_1=0.230 P_10=0.064 P_20=0.036 P_5=0.105 judged_10=0.064 judged_20=0.036 judged_200=0.004 map=0.354 ndcg_cut_10=0.410 ndcg_cut_20=0.431 ndcg_cut_5=0.375 recall_100=0.814 recall_1000=0.853 recip_rank=0.359
2022-01-26 13:41:35,399 - INFO - capreolus.trainer.tensorflow.train - new best dev metric: 0.3496
crystina-z commented 2 years ago

@d1shs0ap the score still looks a bit lowish to me tho. maybe let's PR the record to nima's branch and I'll check the score here.

Could u please share your version of the transformers and all tensorflow related packages? Thanks so much!

d1shs0ap commented 2 years ago

@crystina-z Hey I made the PR to Nima's branch, below are my package versions:

tensorboard==2.7.0
tensorboard-data-server==0.6.1+computecanada
tensorboard-plugin-wit==1.8.0+computecanada
tensorflow==2.4.1+computecanada
tensorflow-addons==0.13.0+computecanada
tensorflow-datasets==4.4.0
tensorflow-estimator==2.4.0+computecanada
tensorflow-hub==0.12.0+computecanada
tensorflow-io-gcs-filesystem==0.22.0+computecanada
tensorflow-metadata==1.5.0
tensorflow-model-optimization==0.7.0
tensorflow-ranking==0.4.2
tensorflow-serving-api==2.7.0
tf-models-official==2.5.0
tf-slim==1.1.0

and

transformers==4.6.0

Thanks!