Cynwell / Text-Level-GNN

Text Level Graph Neural Network for Text Classification
Apache License 2.0
46 stars 13 forks source link
gnn graph-neural-network natural-language-processing nlp pytorch r52 r8 text-classification

Text-Level-GNN

An implementation to the paper: Text Level Graph Neural Network for Text Classification (https://arxiv.org/pdf/1910.02356.pdf)

Features:

File structure:

+---embeddings\
|             +---glove.6B.50d.txt
|             +---glove.6B.100d.txt
|             +---glove.6B.200d.txt
|             +---glove.6B.300d.txt
+---train.py
+---r52-test-all-terms.txt
+---r52-train-all-terms.txt
+---r8-test-all-terms.txt
+---r8-train-all-terms.txt

Since the original link DOES NOT work anymore, I hereby provide the original link and the corresponding dataset file in this repository for anyone who is also looking for the r8 and r52 dataset.

https://www.cs.umb.edu/~smimarog/textmining/datasets/r8-train-all-terms.txt => r8-train-all-terms.txt https://www.cs.umb.edu/~smimarog/textmining/datasets/r8-test-all-terms.txt => r8-test-all-terms.txt https://www.cs.umb.edu/~smimarog/textmining/datasets/r52-train-all-terms.txt => r52-train-all-terms.txt https://www.cs.umb.edu/~smimarog/textmining/datasets/r52-test-all-terms.txt => r52-test-all-terms.txt

Environment:

Successful run on RTX 2070, RTX 2080 Ti and RTX 3090. However, the memory consumption is quite large that it requires smaller batch size / shorter MAX_LENGTH / smaller embedding_size on RTX 2070.

Usage:

Result:

I only tested the model on r8 dataset and is unable to achieve the figure as described in the paper despite having tried some hyperparameter tunings. The closest run that I could get is: Train Accuracy Validation Accuracy Test Accuracy
99.91% 95.7% 96.2%

with embedding_size=300, p=3 and 70<=max_length<=150 and dropout=0. As the experiment settings described in the paper is not clearly stated, I assumed they used a learning rate decay mechanism too. I also added a warming up mechanism to pretrain the model. But actually the model converged quite fast and does not even need to use warming up technique.