SOOJEONGKIMM / Paper_log

papers to-read list + issue
0 stars 0 forks source link

BertGCN: Transductive Text Classification by Combining GCN and BERT #10

Closed SOOJEONGKIMM closed 2 years ago

SOOJEONGKIMM commented 2 years ago

https://aclanthology.org/2021.findings-acl.126.pdf ACL-IJCNLP 2021

SOOJEONGKIMM commented 2 years ago

image

SOOJEONGKIMM commented 2 years ago

Introduction Transductive Learning(연역법)_Text Classification Task, GNN approach

  1. GNNs and Transductive Learning does not depend decisions merely on itself, but also its neighbors.
  2. at training time, model influence from supervised labels through edges. Unlabeled data also contributes. consequently higher performance.
SOOJEONGKIMM commented 2 years ago

BertGCN model successfully combines the powers of large-scale pretraining and graph networks. on wide range of text classification datasets.

SOOJEONGKIMM commented 2 years ago

Method

  1. BertGCN: graph construct (word nodes and document nodes) following TextGCN(2019) define word-document edges and word-word edges based on TF-IDF and PPMI image

*TextGCN: image initial node features n_doc: document node, n_word: word node, d: embedding dimension

image ith GCN layer's output feature. p: activation function, A~: normalized adjacency matrix, W: weight matrix of the layer. L: input feature matrix of model.

image fed to softmax layer for classification. optimize parameters by cross entropy loss over labeled document nodes.

SOOJEONGKIMM commented 2 years ago

Interpolating BERT and GCN Predictions: image image lambda parameter: tradeoff between two objectives. =>overcome drawbacks such as gradient vanishing or over-smoothing.

SOOJEONGKIMM commented 2 years ago

Optimization using Memory Bank

SOOJEONGKIMM commented 2 years ago

Experiments: Experiments Setups image

Main Results: image

SOOJEONGKIMM commented 2 years ago

The Effect of lambda: lambda controls the trade-off between training BertGCN and BERT. model reaches best when lambda=0.7

SOOJEONGKIMM commented 2 years ago

The Effect of Strategies in Joint Training: image result with dataset 20ng

BertGCN strategies

  1. additional using fine-tuning (=RoBERTaGCN)
  2. smaller learning rate