Closed skckompella closed 7 years ago
you should do your padding in your agent, you only receive text (of variable length)
On Mon, Jul 10, 2017 at 6:00 PM, skckompella notifications@github.com wrote:
I want to pad my input sentences to a fixed length (so as to use a convolutional encoder instead of a recurrent one). I understand that data is loaded in dialog_teacher.py and I can modify that code to add my padding. Instead, Can an easy API be added to do the same?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/ParlAI/issues/199, or mute the thread https://github.com/notifications/unsubscribe-auth/AKjk-NeWWtXlkQxRMCEtQ5_seYqQn6HLks5sMp8XgaJpZM4OTeU2 .
Even with a recurrent encoder, when you do batching you find yourself in need of padding. You can see a simple case of doing this in the seq2seq agent here.
In that code, we create a tensor of dimension batchsize by max_length, fill it with zeros, and then fill in the elements that we want to actually use.
The DrQA model does the same thing here, although it does a few more fancy things that helped it with the SQuAD task.
Regardless, as Jason said this is done on the side of the agent though--we wouldn't do this padding on the dialog_teacher side.
Thanks for the clarification. I did see that example. I was trying to pad it to maximum episode length. and hence was looking at dilog_teacher.py to get the max episode length. But I gues max_len per batch will do for now.
I think you'll probably get better performance this way, but feel free to follow up if you have any issues with this fitting your use case.
I want to pad my input sentences to a fixed length (so as to use a convolutional encoder instead of a recurrent one). I understand that data is loaded in dialog_teacher.py and I can modify that code to add my padding. Is there an API to do this? If not, can an API be added to do the same?