TheRensselaerIDEA / twitter-nlp

Data Analytics on Twitter with Natural Language Processing
MIT License
17 stars 7 forks source link

[Research / Analysis] Fine-tune embedding model on tweet dataset #2

Open AbrahamSanders opened 4 years ago

AbrahamSanders commented 4 years ago

Currently we are using the pre-trained Universal Sentence Encoder (large) from TensorFlow hub.

Open area for investigation: The model parameters are marked trainable, so it should be possible to fine-tune on our own COVID tweet dataset.

Alternatively, explore fine tuning other models such as BERT on a semantic similarity task as done here: Sentence-BERT

Comparison of base pre-trained vs. fine-tuned Universal Sentence Encoder (USE) can be done quantitatively or qualitatively , see #1. Same goes for comparing USE vs. BERT or any other model.

AbrahamSanders commented 4 years ago

A good candidate for a pretrained BERT model is covid-twitter-bert