for a large dataset about 10M QA pairs
would it be a better performance on accuracy if we divide the dataset by the length of sentences.
and feed it to different training model and decoding it accordingly(maybe different parameters on RNN size, layers for the different model) ?
The recent versions of TensorFlow implement a system of bucket which will batch together sentences of similar lengths. This is done mainly for performance purpose. The model stays the same for any sentence length.
for a large dataset about 10M QA pairs would it be a better performance on accuracy if we divide the dataset by the length of sentences. and feed it to different training model and decoding it accordingly(maybe different parameters on RNN size, layers for the different model) ?
any comments!!!!