Hi,
Firstly thanks for releasing the implementation. I had a couple of issues when trying things out.
I tried training the teacher for single language pair "en-de" instead of using all languages by using wmt17 data instead of IWSLT so I started training the 3.9 Million sentences and after 9 epoch of training I wanted to save the output for which I got
as you can see the outputs haven't been saved. To check if I made some mistake in changing the scrips or whether it was a limitation of bigger data set here I used a smaller sub set of WMT17 dataset containing 0.9 Million sentences for which I got the log as below.
This time the top_k probabilites and indices got saved and I got the same Saved expert@en_de in log as shown which made me confirm that for larger dataset like 3.9 Million the output somehow is not saved.
There is no option to provide --srcdict and --tgtdict as done here in preprocess.py so every time a new joined dictionary has been created when I wanted to use a subset of data for training.
Hi, Firstly thanks for releasing the implementation. I had a couple of issues when trying things out.
"en-de"
instead of using all languages by usingwmt17
data instead ofIWSLT
so I started training the 3.9 Million sentences and after 9 epoch of training I wanted to save the output for which I gotas you can see the outputs haven't been saved. To check if I made some mistake in changing the scrips or whether it was a limitation of bigger data set here I used a smaller sub set of WMT17 dataset containing 0.9 Million sentences for which I got the log as below. This time the top_k probabilites and indices got saved and I got the same
Saved expert@en_de
in log as shown which made me confirm that for larger dataset like 3.9 Million the output somehow is not saved.--srcdict
and--tgtdict
as done here in preprocess.py so every time a new joined dictionary has been created when I wanted to use a subset of data for training.Thank you once again.