Closed okhat closed 3 years ago
Hi, we tried use different beam sizes at each hop earlier, i.e., selecting the topk from beam1 x beam2 chains. But tuning this parameter does not seem to affect the end QA performance much. We end up with using the same beam size for each hop.
Awesome! Thanks.
In that case, what's the paper's choice on HotPotQA? Is it --beam-size 10 and --topk 100?
We used a large beam size for the leaderboard submission. We used beam size 200 and top250 passage sequences for answer extraction.
Thanks again! But how does this translate to the parameter settings? Did you use 200 x 200 (i.e., 40,000) sequences?
During beam search, yes.
Okay. But for Table 1, it is top-100 sequences (according to #5). Is the beam size there 200x200 too?
No, for table 2, we used beam 100 to get top100. The large batch size 200 is only used for test submission, for all other places without explicitly clarification, we always used beam size k to get the topk, e.g., for R@20, we searched 20x20 possible chains.
All makes sense. Thank you!
Hi---thanks again for the great paper and code release! I'm trying to better understand your definition and use of beam size.
My understanding is that this does not necessarily use the dynamic programming algorithm called beam search, per se, but instead is an exhaustive search over the defined search space (e.g., top-20 passages from hop 1 x top-5 passages for each, from hop 2, yielding a total of 100 pairs). Is this accurate?
I also noticed that some example code has
--beam-size-1 20 --beam-size-2 5
but it seems likeeval_mhop_retrieval.py
only accepts onebeam-size
. Is this equivalent to choosing both with one constant?