Closed a94763075 closed 4 years ago
Hi @a94763075,
It looks like there are a lot of differences that you have, as compared to our implementation and what we reported, which may explain the model's low performance. To name a few:
[CLS]
representations works fine. See paper and/or our implementation in this repository for details.Can you try using the implementations in this repository directly? Then, if you're still having problems, it will be easier to figure out what's going wrong. Without starting from the same place, it'll be very difficult to figure out what exactly is wrong.
Thanks, sean
I have implementations in this repository. Dev NDCG@20:0.4066 P@20:0.4701 It works fine
In my reproduce by your suggestion
gradient accumulation
optimizer I copied this code from repository as optimizer
params = [(k, v) for k, v in model.named_parameters() if v.requires_grad]
non_bert_params = {'params': [v for k, v in params if not k.startswith('bert.')]}
bert_params = {'params': [v for k, v in params if k.startswith('bert.')], 'lr': BERT_LR}
optimizer = torch.optim.Adam([non_bert_params, bert_params], lr=LR)
handle Doc I just take pre 520 tokens 520-[CLS]-2*[SEP]-qlen
But it still have some little different. Dev NDCG@20:0.37536 P@20:0.43176
It sounds like you are trying to debug your implementation? I cannot really help without the code you are using itself. I'd recommend continuing to replace components in your implementation with those found in this repository. Specifically, handling longer documents is probably important.
I reproduce VanillaBERT but only get the result **NDCG@20:0.3889 P@20 :0.3180 optimizer :AdamW batch size :1 lr = 1e-5* Train by HingeLoss and by following code's provided 'f.train.pairs' list to random choose pos neg pairs
far from paper's NDCG@20:0.4541 P@20 :0.4042