Closed michelole closed 5 years ago
tl;dr: fasttext with pre-trained embeddings indeed converges faster (less epochs) to the max accuracy. This effect is stronger the larger the learning rate. Anyway, fasttext with self-trained embeddings always seem to catch up at some point.
Hypothesis: the more epochs, the more we overfit.