capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

tfrecord even after training? #182

Closed nimasadri11 closed 3 years ago

nimasadri11 commented 3 years ago

Hi @crystina-z ,

2 (hopefully) Quick Questions:

Q1) My 10 training iterations completed, but the program is still running. It is writing more .tfrecord files. I understand the purpose of the tfrecord files to be written before the training. However, now that the training is done, why is it still writing tfrecord files? What purpose do these serve?

Q2) Another thing: using the "standard" HPs, along with the config_msmarco.txt file, I thought I should be getting over 0.35 MR@10, but after 10 iterations of capreolus.trainer.tensorflow.train, my MR@10 dev metric seem to be 0.2033

"Standard Parameters":

lr=1e-3
bertlr=2e-5
itersize=30000
warmupsteps=3000
decaystep=$itersize
decaytype=linear
crystina-z commented 3 years ago

Hi! For Q1, those r probably the tfrecord for test set - we prepare the tf record for both train and dev set prior to the training as they will be used in this phase, and defer the test set ones afterwards. But for now don't need to worry about the test set as its qrel is not public and we can't evaluate it by ourselves anyway.

Q2: hrm the score definitely doesn't sound right. Could u provide the entire script and the commit id? I'll look into it.

nimasadri11 commented 3 years ago

@crystina-z Oh yes, I recall reading in Jimmy's paper that MS doesn't publish the test set qrel.

Here are the details: (the fold=1 is supposed to be there, right?)

Commit id: 31cc183a94f8be3b14f443031570968da3247dd7

config_msmarco.txt

optimize=MRR@10
threshold=1000

benchmark.name=msmarcopsg
rank.searcher.name=msmarcopsgbm25
reranker.name=TFBERTMaxP

reranker.pretrained=bert-base-uncased

reranker.extractor.usecache=True
reranker.extractor.numpassages=1
reranker.extractor.maxseqlen=512
reranker.extractor.maxqlen=50
reranker.extractor.tokenizer.pretrained=bert-base-uncased

reranker.trainer.usecache=True
reranker.trainer.niters=10
reranker.trainer.batch=16
reranker.trainer.evalbatch=256
reranker.trainer.itersize=48000
reranker.trainer.warmupiters=1
reranker.trainer.decayiters=10
reranker.trainer.decaytype=linear

reranker.trainer.loss=pairwise_hinge_loss
#!/bin/bash
#SBATCH --job-name=msmarcopsg
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100l:4
#SBATCH --ntasks-per-node=1
#SBATCH --mem=48GB
#SBATCH --time=72:00:00
#SBATCH --cpus-per-task=32
#SBATCH --account=def-jimmylin
export CUDA_AVAILABLE_DEVICES=0,1,2,3
export SLURM_ACCOUNT=def-jimmylin
export SBATCH_ACCOUNT=$SLURM_ACCOUNT
export SALLOC_ACCOUNT=$SLURM_ACCOUNT

source ~/.bashrc
source /home/nsadri/capreolus-env/bin/activate
cd /home/nsadri/scratch/capreolus/
export CAPREOLUS_CACHE=/scratch/nsadri/.capreolus/cache
export CAPREOLUS_RESULTS=/scratch/nsadri/.capreolus/results
lr=1e-3
bertlr=2e-5
itersize=30000
warmupsteps=3000
decaystep=$itersize  # either same with $itersize or 0
decaytype=linear

python -m capreolus.run rerank.train with \
    file=~/scratch/capreolus/docs/reproduction/config_msmarco.txt  \
    reranker.trainer.lr=$lr \
    reranker.trainer.bertlr=$bertlr \
    reranker.trainer.itersize=$itersize \
    reranker.trainer.warmupiters=$warmupsteps \
    reranker.trainer.decayiters=$decaystep \
    reranker.trainer.decaytype="linear" \
    fold=s1 \
    reranker.trainer.validatefreq=10
nimasadri11 commented 3 years ago

@crystina-z were you able to give this a go? I tried again from cache, but it's still low:

dev metrics: MRR@10=0.204 P_1=0.115 P_10=0.044 P_20=0.028 P_5=0.066 judged_10=0.044 judged_20=0.028 judged_200=0.004 map=0.213 ndcg_cut_10=0.254 ndcg_cut_20=0.280 ndcg_cut_5=0.219 recall_100=0.730 recall_1000=0.853 recip_rank=0.217

Also, another thing I noticed that each time it runs, it rewrites the .tfrecord files. Is there a command line argument to force it to use the cached .tfrecord files from previous runs?

nimasadri11 commented 3 years ago

@crystina-z Could this be the reason that I am getting low MR@10?

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
nimasadri11 commented 3 years ago

@crystina-z FYI issue of getting a low score is resolved after installing capreolus using the new commands you provided. I successfully got MRR@10=0.345. Closing this issue now.

nimasadri11 commented 3 years ago

@crystina-z sorry to bother again. I just want to verify I am getting expected results. I am using the below HPs and getting MRR@10=0.345. I thought this is close to the correct results, but professor Lin mentioned that it's on the low side. Could you verify wether this number is close to the expected results or not? I supposed maybe because we are using validatefreq=$niters, the results is a bit lower than the expected results?

niters=10
batch_size=16
validatefreq=$niters
decayiters=$niters
threshold=1000
file=docs/reproduction/config_msmarco.txt