Closed kalyangvs closed 5 years ago
Hi,
We compare only to standard softmax and not adaptive softmax but thanks for pointing it out, I will add those comparisons soon!
Beam search is not usually done while training so I don't understand your question. But we don't claim any speed improvements while inference. The model will perform similarly to a softmax based method with any beam size in terms of speed. But in our experiments, beam search in our model hasn't led to any improvements in terms of BLEU yet and we reported results with greedy search in the paper. I'm still working on beam search.
As I said in (2), inference time in this model is more or less the same as that of softmax based models since both involve a O(V) time computation. This holds regardless of whether you run it on a GPU or a CPU.
Thanks,
Thanks for the detailed explanation.
Hi thanks for paper and the code. I have few doubts,
Is the training speed of the model trained via continuous outputs is 2.5X times the model trained via seq2seq with softmax or is it with adaptive softmax - sampling based (since adaptive softmax proved to be 3 to 5 times faster than the normal softmax) and the bleu is considerably good.
Does implementing the beam size increases the training speed in the same way as softmax based models or in this approach a bit differently like, beam size 5 implementation of continuous inputs training time gives only 1 - 1.5 X times of beam size 5 implementation of softmax.
regarding the inference speed does it prove to be lesser than the softmax based approach and by what factor - considering comparision between the two models of considerable batch size on CPU (since training time is 2.5 times faster considering the convergence plus continuous outputs but not the model alone)
Thanks.