Closed skurzhanskyi closed 4 years ago
Hi, end_to_end.sh trains the model over 1000 sentences just for demonstration purposes. I believe that the checkpoint obtained from training over just 1000 sentences is of no practical use (in comparison to the best checkpoint). Hence, I did not retain the model trained on just 1000 sentences.
There is no practical use, but it allows evaluating the speed of your model without training on TPU (which are not widely available).
For evaluating inference time speedups, the pre-trained checkpoint can be utilized.
Yes, but in such way, a lot of time goes for applying changes that model produces (because they are random).
Predictions from a pretrained checkpoint of the GEC model should not be random.
Hi authors, Great work with the paper! I'm interested if you're going to release weights, not of the single best model, but the one mentioned in
example_scripts/README.md
with F_{0.5} score close to 26.6. Thank you