Closed nimasadri11 closed 3 years ago
Hi! For Q1, those r probably the tfrecord for test set - we prepare the tf record for both train and dev set prior to the training as they will be used in this phase, and defer the test set ones afterwards. But for now don't need to worry about the test set as its qrel is not public and we can't evaluate it by ourselves anyway.
Q2: hrm the score definitely doesn't sound right. Could u provide the entire script and the commit id? I'll look into it.
@crystina-z Oh yes, I recall reading in Jimmy's paper that MS doesn't publish the test set qrel.
Here are the details:
(the fold=1
is supposed to be there, right?)
Commit id: 31cc183a94f8be3b14f443031570968da3247dd7
config_msmarco.txt
optimize=MRR@10
threshold=1000
benchmark.name=msmarcopsg
rank.searcher.name=msmarcopsgbm25
reranker.name=TFBERTMaxP
reranker.pretrained=bert-base-uncased
reranker.extractor.usecache=True
reranker.extractor.numpassages=1
reranker.extractor.maxseqlen=512
reranker.extractor.maxqlen=50
reranker.extractor.tokenizer.pretrained=bert-base-uncased
reranker.trainer.usecache=True
reranker.trainer.niters=10
reranker.trainer.batch=16
reranker.trainer.evalbatch=256
reranker.trainer.itersize=48000
reranker.trainer.warmupiters=1
reranker.trainer.decayiters=10
reranker.trainer.decaytype=linear
reranker.trainer.loss=pairwise_hinge_loss
#!/bin/bash
#SBATCH --job-name=msmarcopsg
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100l:4
#SBATCH --ntasks-per-node=1
#SBATCH --mem=48GB
#SBATCH --time=72:00:00
#SBATCH --cpus-per-task=32
#SBATCH --account=def-jimmylin
export CUDA_AVAILABLE_DEVICES=0,1,2,3
export SLURM_ACCOUNT=def-jimmylin
export SBATCH_ACCOUNT=$SLURM_ACCOUNT
export SALLOC_ACCOUNT=$SLURM_ACCOUNT
source ~/.bashrc
source /home/nsadri/capreolus-env/bin/activate
cd /home/nsadri/scratch/capreolus/
export CAPREOLUS_CACHE=/scratch/nsadri/.capreolus/cache
export CAPREOLUS_RESULTS=/scratch/nsadri/.capreolus/results
lr=1e-3
bertlr=2e-5
itersize=30000
warmupsteps=3000
decaystep=$itersize # either same with $itersize or 0
decaytype=linear
python -m capreolus.run rerank.train with \
file=~/scratch/capreolus/docs/reproduction/config_msmarco.txt \
reranker.trainer.lr=$lr \
reranker.trainer.bertlr=$bertlr \
reranker.trainer.itersize=$itersize \
reranker.trainer.warmupiters=$warmupsteps \
reranker.trainer.decayiters=$decaystep \
reranker.trainer.decaytype="linear" \
fold=s1 \
reranker.trainer.validatefreq=10
@crystina-z were you able to give this a go? I tried again from cache, but it's still low:
dev metrics: MRR@10=0.204 P_1=0.115 P_10=0.044 P_20=0.028 P_5=0.066 judged_10=0.044 judged_20=0.028 judged_200=0.004 map=0.213 ndcg_cut_10=0.254 ndcg_cut_20=0.280 ndcg_cut_5=0.219 recall_100=0.730 recall_1000=0.853 recip_rank=0.217
Also, another thing I noticed that each time it runs, it rewrites the .tfrecord
files. Is there a command line argument to force it to use the cached .tfrecord
files from previous runs?
@crystina-z Could this be the reason that I am getting low MR@10?
All PyTorch model weights were used when initializing TFBertForSequenceClassification.
Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
@crystina-z FYI issue of getting a low score is resolved after installing capreolus using the new commands you provided. I successfully got MRR@10=0.345
. Closing this issue now.
@crystina-z sorry to bother again. I just want to verify I am getting expected results. I am using the below HPs and getting MRR@10=0.345. I thought this is close to the correct results, but professor Lin mentioned that it's on the low side. Could you verify wether this number is close to the expected results or not? I supposed maybe because we are using validatefreq=$niters, the results is a bit lower than the expected results?
niters=10
batch_size=16
validatefreq=$niters
decayiters=$niters
threshold=1000
file=docs/reproduction/config_msmarco.txt
Hi @crystina-z ,
2 (hopefully) Quick Questions:
Q1) My
10
training iterations completed, but the program is still running. It is writing more.tfrecord
files. I understand the purpose of thetfrecord
files to be written before the training. However, now that the training is done, why is it still writingtfrecord
files? What purpose do these serve?Q2) Another thing: using the "standard" HPs, along with the
config_msmarco.txt
file, I thought I should be getting over0.35
MR@10
, but after 10 iterations ofcapreolus.trainer.tensorflow.train
, myMR@10
dev metric seem to be0.2033
"Standard Parameters":