Open santhoshkolloju opened 5 years ago
The transformer beam-search is adapted from the official implementation (tensor2tensor). Not sure how it can speed up.
A possible way would be using a more efficient variant of transformer decoder (e.g., TransformerXL). We don't have the bandwidth at this point though. Any contributions are welcome
Same question.
I have been using beam size of 3 and alpha 1.0 for beam search decoding looks like it is very slow . Greedy search takes around 30-40 seconds for generating a sequence of length 250 words. but beam search takes around 2 minutes ,
Can you help me improve the inference . i tried quantising the model to 8bits it decreased the size of the model but inference time still remains the same.
Any help is appreciated.
Thanks