They proposed a method to evaluate (output, reference) sentences with BERT.
2. What is amazing compared to previous studies?
Their method using pre-trained LM, so it does not need training any sentence embeddings.
The SotA method in WMT17 is Blend. It uses 25 features, but these are local information(words, n-gram)
There are some MT evaluation methods using sentence embeddings(= global information)
ReVal: training sentence embeddings using WMT data.
RUSE: using pre-trained sentence embedding, Quick Thought.
THIS PAPER: using pre-trained LM, BERT.
3. Where is the key to technologies and techniques?
Their model use pre-trained LM and MLP regression, so it does not need training LM.
Using BERT, we encode a sentence pair(MT output, reference) in one time.
Therefore, this method can consider the relation of sentence pairs.
4. How did validate it?
They tried WMT15-17.
Their model achieved SotA.
5. Is there a discussion?
What is a difference between RUSE and this method?
To answer this question, they tried this experiment.
0. Paper
Machine Translation Evaluation with BERT Regressor
1. What is it?
They proposed a method to evaluate (output, reference) sentences with BERT.
2. What is amazing compared to previous studies?
Their method using pre-trained LM, so it does not need training any sentence embeddings. The SotA method in WMT17 is Blend. It uses 25 features, but these are local information(words, n-gram) There are some MT evaluation methods using sentence embeddings(= global information)
3. Where is the key to technologies and techniques?
Their model use pre-trained LM and MLP regression, so it does not need training LM. Using BERT, we encode a sentence pair(MT output, reference) in one time. Therefore, this method can consider the relation of sentence pairs.
4. How did validate it?
They tried WMT15-17.
Their model achieved SotA.
5. Is there a discussion?
What is a difference between RUSE and this method?
To answer this question, they tried this experiment.
This result shows that 2 methods are important.
6. Which paper should read next?