[Research / Analysis] Fine-tune embedding model on tweet dataset

Currently we are using the pre-trained Universal Sentence Encoder (large) from TensorFlow hub.

Open area for investigation: The model parameters are marked trainable, so it should be possible to fine-tune on our own COVID tweet dataset.

Alternatively, explore fine tuning other models such as BERT on a semantic similarity task as done here: Sentence-BERT

Comparison of base pre-trained vs. fine-tuned Universal Sentence Encoder (USE) can be done quantitatively or qualitatively , see #1. Same goes for comparing USE vs. BERT or any other model.

TheRensselaerIDEA / twitter-nlp

[Research / Analysis] Fine-tune embedding model on tweet dataset #2