Backtracking in beam search

shubhamagarwal92 commented 6 years ago

Compared to OpenNMT, why do we need this block which handles the dropped sequences that see EOS earlier. (This is not there in their beam search implementation.) They are also doing a similar process: not letting the EOS have children here. However they have this end condition when EOS is at the top. They construct back the hypothesis using get_hyp function.

More specifically, can you explain elaborately what we are doing here.

            #       1. If there is survived sequences in the return variables, replace
            #       the one with the lowest survived sequence score with the new ended
            #       sequences
            #       2. Otherwise, replace the ended sequence with the lowest sequence
            #       score with the new ended sequence

I understand why we need to handle EOS sequences since we have their information in backtracking variables. But why do we need to "replace the one with the lowest survived sequence score with the new ended sequences"? AFAIK, this res_k_idx is tracking which beam (from the end) can we replace the information (the two conditions specified in the comments). However, we are not replacing the contents of the beam which got EOS, i.e:

t_predecessors[res_idx] = predecessors[t][idx[0]]
t_predecessors[idx] = ??

I understand that after this process all the beams remain static and we use index_select at each step to select the top beams.

Also, the unit test for top_k_decoder is not deterministic. Fails when batch_size>2 and also sometimes when batch_size==2.

pskrunner14 commented 6 years ago

@shubhamagarwal92 thanks for pointing this out. I'll check their implementation and see what's different. Working on the test to make it more deterministic. Will test more beam sizes too.

Mehrad0711 commented 5 years ago

Hi, Are there any updates for this issue? I've implemented a similar beam search strategy which uses the _backtrack(...) function from this repo but even with a beam_size of 1, I get worse results than greedy decoding. Would be really helpful if you can double-check the implementation. Thanks.

GZJAS commented 5 years ago

I studied the codes these days, and I thought you can use the torch.repeat_interleave. Such as follow: hidden = tuple([torch.repeat_interleave(h, self.k, dim=1) for h in encoder_hidden]) inflated_encoder_outputs = torch.repeat_interleave(encoder_outputs, self.k, dim=0)

shubhamagarwal92 commented 5 years ago

@Mehrad0711 maybe you can try and integrate BS from allennlp; their implementation here

pskrunner14 commented 5 years ago

Hey @Mehrad0711 @shubhamagarwal92 sorry haven't gotten the time to work on this yet. You're welcome to submit a PR :)

ghost commented 3 years ago

Any update on this issue ?

IBM / pytorch-seq2seq

Backtracking in beam search #151