facebookresearch / multihop_dense_retrieval

Multi-hop dense retrieval for question answering
Other
212 stars 22 forks source link

predict function giving different results for different batchsize in train_single.py #28

Open kirnap opened 1 year ago

kirnap commented 1 year ago

Thanks for sharing the code.

I found that the predict function giving different accuracy when I change the batchsize. I think this is happening because number of total negative instances change depending on the batch size in here.

I have a working example below with a dataset of 4 instances, and batchsizes of 1 and 4. You may find the code snippet below:

# product refers to line 311 of train_single.py file
In [4]: product # this is for batchsize of 4
Out[4]: 
tensor([[764.7902, 765.5161, 765.0731, 765.2079, 763.8848],
        [764.1088, 764.6402, 765.0795, 764.3550, 764.4390],
        [765.2510, 764.2839, 764.9810, 764.8682, 765.2878],
        [765.4824, 765.1337, 765.5799, 765.4690, 765.9390]], device='cuda:0')

In [6]: product # scores for the first instance output for batchsize of 1
Out[6]: tensor([[764.7902, 763.8848]], device='cuda:0')

In [7]: product # scores for the second instance output for batchsize of 1
Out[7]: tensor([[764.6402, 764.4390]], device='cuda:0')

In [8]: product # scores for the third instance output for batchsize of 1
Out[8]: tensor([[764.9811, 765.2878]], device='cuda:0')

In [9]: product # scores for the fourth instance output for batchsize of 1
Out[9]: tensor([[765.4691, 765.9391]], device='cuda:0')

Above code snippet makes sense because when we put all instances together we add batchsize-1 many negative instances to product which may change overall ranking. I think there needs to be clear definition of accuracy or batchsize of hardcoded 1. Thanks!