Open reaganrewop opened 5 years ago
Will be extending GPT to get paragraph embeddings by using a LSTM-based "head" trained offline on sentence features extracted from GPT.
I tested GPT and BERT for possible paragraph embedding applications. It was found that BERT gave a narrow range of scores, in the range of 0.7-0.99 across out-of-domain and in-domain topics as opposed to 0.2-0.9 for GPT, possibly due to aggregation of tokens to get the pooled feature representation of a sentence. Moreover, BERT-paragraph embeddings formed by aggregating sentence level features is sensitive to noise(appending an out-of-domain sentence to the end of an in-domain paragraph reduces the score drastically). On the other hand, GPT is more resilient to the added noise. Adding an LSTM-head to BERT did not alleviate these problems.
Conclusion: GPT-based paragraph embeddings are more stable than BERT-based ones.
Conclusion2: GPT paragraph embeddings show good topic separation and can be used for separating segments based on context. In order to not rely on aggregation of sentence features, a Bi-LSTM head was used to aggregate the features instead of summing up the sentence-level feature vectors. This resulted in better context-capture across a paragraph.
Test GPT2 feature representation linearity and it's scalability, for paragraph vectors.