BertGCN: Transductive Text Classification by Combining GCN and BERT

SOOJEONGKIMM / Paper_log

papers to-read list + issue

0 stars 0 forks source link

BertGCN: Transductive Text Classification by Combining GCN and BERT #10

Closed SOOJEONGKIMM closed 2 years ago

SOOJEONGKIMM commented 2 years ago

https://aclanthology.org/2021.findings-acl.126.pdf ACL-IJCNLP 2021

SOOJEONGKIMM commented 2 years ago

Introduction Transductive Learning(연역법)_Text Classification Task, GNN approach

GNNs and Transductive Learning does not depend decisions merely on itself, but also its neighbors.
at training time, model influence from supervised labels through edges. Unlabeled data also contributes. consequently higher performance.

SOOJEONGKIMM commented 2 years ago

BertGCN model successfully combines the powers of large-scale pretraining and graph networks. on wide range of text classification datasets.

SOOJEONGKIMM commented 2 years ago

Method

BertGCN: graph construct (word nodes and document nodes) following TextGCN(2019) define word-document edges and word-word edges based on TF-IDF and PPMI

*TextGCN: initial node features n_doc: document node, n_word: word node, d: embedding dimension

ith GCN layer's output feature. p: activation function, A~: normalized adjacency matrix, W: weight matrix of the layer. L: input feature matrix of model.

fed to softmax layer for classification. optimize parameters by cross entropy loss over labeled document nodes.

SOOJEONGKIMM commented 2 years ago

Interpolating BERT and GCN Predictions: lambda parameter: tradeoff between two objectives. =>overcome drawbacks such as gradient vanishing or over-smoothing.

SOOJEONGKIMM commented 2 years ago

Optimization using Memory Bank

SOOJEONGKIMM commented 2 years ago

Experiments: Experiments Setups

Main Results:

SOOJEONGKIMM commented 2 years ago

The Effect of lambda: lambda controls the trade-off between training BertGCN and BERT. model reaches best when lambda=0.7

SOOJEONGKIMM commented 2 years ago

The Effect of Strategies in Joint Training: result with dataset 20ng

BertGCN strategies

additional using fine-tuning (=RoBERTaGCN)
smaller learning rate