Open sxjscience opened 3 years ago
Add the sequence-level distillation to NMT training. This means, we draw samples from the teacher model with beam-search and train the student model with the generated samples.
Description
Add the sequence-level distillation to NMT training. This means, we draw samples from the teacher model with beam-search and train the student model with the generated samples.
References