Closed ChenYutongTHU closed 3 years ago
We do not use any diversity-augmented beam search strategy in our experiments. The outputs of the two beam search versions are almost the same. The small difference might be caused by the sort part, which can be ignored in most of the cases.
Hello, thanks for this great work. When I was reading the code, I found that in models/att_basic_model.py, DBS: diversity-augmented beam search is recommended for xlan model.
However, in models/basic_model.py where DBS is implemented, the group size is forced to be 1, which means that diversity-augmented beam search is degraded to standard beam search.
So how can DBS with group_size=1 slightly outperform than standard BS for xlan as the comment above mentioned? Thanks a bunch!