Universal Sentence Encoder
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Ce ́spedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
(all in Google, Google AI)
1. What is it?
This paper shows that the 2 models to use transfer learning.
2. What is amazing compared to previous studies?
They published the trained model and made it easy to use.
import tensorflow_hub as hub
embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder/1")
embedding = embed(["The quick brown fox jumps over the lazy dog."])
3. Where is the key to technologies and techniques?
They proposed 2 models, one is based on the transformer, the other is based on the Deep averaging network(DAN).
3.1 Transformer Encoder
Using the encoding sub-graph of Transformer which uses attention to compute context aware representations of words in a sentence that addresses both the ordering and identity of all the other words.
Achieved the best performance in the transfer learning.
Cost of computing time and memory usage are problems.
3.2 DAN
Input embeddings for words and bi-grams are averaged together and passed through a feedforward Deep Neural Network (DNN).
Strong baseline performance
Computing time is linear in the length of the input sequence
4. How did validate it?
They tried some transfer learning.
Transfer learning from sentence-level tends to better than that only uses word-level.
And they tried the other transfer task for varying amounts of training data (like smaller),
their methods achieved good performance.
5. Is there a discussion?
There are trade-offs regarding accuracy and model complexity.
0. Paper
Universal Sentence Encoder Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Ce ́spedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil (all in Google, Google AI)
1. What is it?
This paper shows that the 2 models to use transfer learning.
2. What is amazing compared to previous studies?
They published the trained model and made it easy to use.
3. Where is the key to technologies and techniques?
They proposed 2 models, one is based on the transformer, the other is based on the Deep averaging network(DAN).
3.1 Transformer Encoder
3.2 DAN
4. How did validate it?
They tried some transfer learning.
Transfer learning from sentence-level tends to better than that only uses word-level. And they tried the other transfer task for varying amounts of training data (like smaller),
their methods achieved good performance.
5. Is there a discussion?
There are trade-offs regarding accuracy and model complexity.
6. Which paper should read next?
The other authors make use of this method to calculate the cross-lingual sentence similarity. Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model