Closed Sjzwind closed 5 years ago
Hi there!
Hi there!
- We split the candidates into groups of 100 at https://github.com/facebookresearch/EmpatheticDialogues/blob/master/retrieval_train.py#L137
- Do you mean using 9 negative examples instead of 511 negative examples for a batch size of 512? Generally I've found that larger candidate batches work better for training retrieval models. One intuition for this is that it's a harder job to select the right answer among 100 candidates vs. 10 candidates, and so the model learns more information about how to pick the right candidate if it has to pick among a larger pool.
Hi, I think your method is likely to be a pointwise-mothod not pairwise-method. is it? your method is besed on representation of contexts and responses. If I want to use interaction-based method, for example, concatenate context and response and get a possibility, then I couldn't get [batch, batch] dot_products matrix, so I guess that you use representation-based method not interaction-based method to train due to this reason, I don't know whether my thought is right. Thanks.
Oh, I think I see what you're saying - yes, we're using a biencoder architecture, which considers the [batch, batch] dot-products matrix of contexts and responses. No, we haven't tried a cross-encoder architecture, which would concatenate the contexts and responses together. I imagine that that architecture would likely perform a bit better, at the expense of slower training speed.
Thank you for your reply : )
No problem!
Hi, I have several questions regarding retrieval-based model
Looking forward to your reply. Best wishes