Closed minstar closed 5 years ago
It's not possible in a very easy way, but you could modify the code that prepares the batches of data to change the segmentation candidates, or read multiple streams of data (one for each segmentation you want) and choose one of those
@huihuifan Thank you! I'll give it a shot.
Hi, I'm currently doing my research in applying "Subword Regularization" to training NMT model, where they sample from segmentation candidates every parameter update. I am trying to apply this method to "IWSLT17" dataset provided in examples/translation.
I noticed that there is bash file which generates "segmented files", and preprocess.py which creates dictionary and bin file. And during training, inputs are fixed. However, I want to change inputs every epoch, (change input's segmentation by sampling).
Is it possible? or any suggestions? In training file, should I decode given segmented inputs into raw text and then do sampling(among segmentation candidates) from it?
Thank you.