-
I have trained a seq2seq NMT model (EN-DE) with 1M samples and saved the latest checkpoint. Now, I have some domain-specific data of 50K sentence pairs which has **not** been seen in previous training…
-
## ざっくり言うと
翻訳タスクの目的関数をsubword tokenizationの期待値として定式化し直し,tokenization結果をサンプリングしNMTを学習することで精度を上げた.サンプリングが正則化やdata augementationと似た役割を果たしている.tokenizationをサンプリングするために,決定論的ではなく確率的な扱いが出来るUnigram language m…
-
The vast majority of time during training is spent in the dot product and scaled additions. We have been doing unaligned loads so far. I have made a quick modification that ensures that every embeddin…
-
### Metadata
- Authors: Rico Sennrich and Barry Haddow
- Organization: School of Informatics, University of Edinburgh
- Conference: WMT 2016
- Link: https://goo.gl/jqYQ8r
-
Hi, thanks for the great work!
I try to run the code. However, I don't know how to do data preprocessing for AMR corpus. May I ask how can I do data preprocessing?
-
I am currently training a transformer model and have followed the MTM labs to apply BPE to my own corpus. However, I'm unsure of the effect that providing a pre-determined vocabulary has. Does it impa…
-
Hello,
With a vocabulary size of 55K, I have trained the model for 200K steps and saved the latest checkpoint.
Now, I increased my vocabulary size to 70K.
1. How can I continue training from the …
-
For continuation words. there are varying number of # signs. For example, in the 5 first words we have followings:
- /c/de/####er
- /c/de/###er
- /c/de/##er
For example, if I have a word endi…
-
There is at least one unusable vocabulary entry in our gabert vocab, namely `##-"`. Find all entries that the BERT will never use as BERT first splits around all non-alphanumeric characters without ap…
-
I am trying to train a Chinese model of a conformer. When I train with 4 2080ti, there will be an error in the middle of the epoch: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered…