What's the difference between per_node_beam_size and beam_size?

allenai / allennlp-semparse

A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLP

Apache License 2.0

107 stars 24 forks source link

What's the difference between per_node_beam_size and beam_size? #9

Closed entslscheia closed 4 years ago

entslscheia commented 4 years ago

In beam_search.py, it's said that Setting per_node_beam_size to a number smaller than beam_size may give better results, however, I carefully read the source code and found that both two parameters work exactly in the same way, i.e., restrict the max number of actions (new states) for each batch_index at each step. Specifically, per_node_beam_size is used as max_action for take_step, which specifies the maximum number of new states for each batch_index, so it looks to me that states.extend(batch_states[: self._beam_size]) is totally redundant, instead, we can simply use states.extend(batch_states) if we use per_node_beam_size. And making per_node_beam_size smaller than beam_size makes no sense because there are at most per_node_beam_size items inside batch_states.

entslscheia commented 4 years ago

I have also tested it by setting per_node_beam_size to be 3, and beam_size to be 5, and as what I expected, the size of batch_states is never larger than 3, and I think this justifies my understanding of the source code. But Setting per_node_beam_size to a number smaller than beam_size may give better results doesn't look like a careless annotation mistake here, because it also refers us to a paper, indicating there is some important idea behind this. For now, I haven't read that paper yet so I am not sure what the idea is. But I guess this might be a design issue that the idea from that paper is not implemented correctly.

matt-gardner commented 4 years ago

Yep, you're right, this is a bug in how per_node_beam_size was implemented that I missed when I reviewed the initial PR for it (should have asked for tests!). It doesn't work as advertised. The important line in take_step that makes it clear that this implementation won't work: https://github.com/allenai/allennlp-semparse/blob/937d5945488a33c61d0047bd74d8106e60340bbd/allennlp_semparse/state_machines/transition_functions/transition_function.py#L61-L65

The per-node beam size needs to be implemented inside of take_step for this to actually work.