Open GiovanniPioDelvecchio opened 1 year ago
The embeddings have been changed from BERT to GloVe, generating whole dataset is yet to be done, some experiments were prerfomed with 2400 samples for training and 600 for validation. The new embeddings can be found here: https://nlp.stanford.edu/projects/glove/ extract glove.6B.50d.txt from glove.6B.zip
It is needed to modify the Dataset_from_sentences class in order to let it handle the whole dataset (wich comprehends more than 41k samples). Some possible implementation paths could be: