Closed entslscheia closed 4 years ago
I have also tested it by setting per_node_beam_size
to be 3, and beam_size
to be 5, and as what I expected, the size of batch_states
is never larger than 3, and I think this justifies my understanding of the source code. But Setting per_node_beam_size
to a number smaller than beam_size
may give better results doesn't look like a careless annotation mistake here, because
it also refers us to a paper, indicating there is some important idea behind this. For now, I haven't read that paper yet so I am not sure what the idea is. But I guess this might be a design issue that the idea from that paper is not implemented correctly.
Yep, you're right, this is a bug in how per_node_beam_size
was implemented that I missed when I reviewed the initial PR for it (should have asked for tests!). It doesn't work as advertised. The important line in take_step
that makes it clear that this implementation won't work: https://github.com/allenai/allennlp-semparse/blob/937d5945488a33c61d0047bd74d8106e60340bbd/allennlp_semparse/state_machines/transition_functions/transition_function.py#L61-L65
The per-node beam size needs to be implemented inside of take_step
for this to actually work.
In beam_search.py, it's said that Setting
per_node_beam_size
to a number smaller thanbeam_size
may give better results, however, I carefully read the source code and found that both two parameters work exactly in the same way, i.e., restrict the max number of actions (new states) for eachbatch_index
at each step. Specifically,per_node_beam_size
is used asmax_action
fortake_step
, which specifies the maximum number of new states for eachbatch_index
, so it looks to me thatstates.extend(batch_states[: self._beam_size])
is totally redundant, instead, we can simply usestates.extend(batch_states)
if we useper_node_beam_size
. And makingper_node_beam_size
smaller thanbeam_size
makes no sense because there are at mostper_node_beam_size
items insidebatch_states
.