Modifying the Dataset_from_sentences class

GiovanniPioDelvecchio / GCNs_on_text

This repository has the purpose of being a comprehensive guide to Graph Neural Network designing for text classification

0 stars 0 forks source link

Modifying the Dataset_from_sentences class #2

Open GiovanniPioDelvecchio opened 1 year ago

GiovanniPioDelvecchio commented 1 year ago

It is needed to modify the Dataset_from_sentences class in order to let it handle the whole dataset (wich comprehends more than 41k samples). Some possible implementation paths could be:

serialize the object and load it before running the notebook;
allow the download of the dataset from some source, as described in the PyG documentation (the download method must be extended): https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_dataset.html

GiovanniPioDelvecchio commented 1 year ago

The embeddings have been changed from BERT to GloVe, generating whole dataset is yet to be done, some experiments were prerfomed with 2400 samples for training and 600 for validation. The new embeddings can be found here: https://nlp.stanford.edu/projects/glove/ extract glove.6B.50d.txt from glove.6B.zip