cezhang01 / Adjacent-Encoder

Source code of the AAAI-2020 paper "Topic Modeling on Document Networks with Adjacent-Encoder"
12 stars 4 forks source link

Adjacent-Encoder

This is the tensorflow implementation of the AAAI-2020 paper "Topic Modeling on Document Networks with Adjacent-Encoder" by Ce Zhang and Hady W. Lauw.

Adjacent-Encoder is a neural topic model that extracts topics for networked documents for document classification, clustering, link prediction, etc.

Implementation Environment

Run

python main.py

Parameter Setting

Data

We extracted four independent citation networks (DS, HA, ML, and PL) from source Cora (http://people.cs.umass.edu/~mccallum/data/cora-classify.tar.gz). Note that the well-known benchmark Cora dataset as used in GAT is actually the ML subset. In addition to ML, we further created three new citation networks.

In ./cora file we release these datasets, each of which contains adjacency matrix, content, label, label name, and vocabulary.

Output

The document embeddings are output to the ./results file. Each row represents one document embedding, and each column represents one dimension of the embedding, or one topic.

In transductive learning, training embeddings are the same as test embeddings. In inductive learning, training embeddings are those of training documents (no validation documents), and testing embeddings are inferred for testing documents.

Reference

If you use our paper, including code and data, please cite

@inproceedings{adjenc,
    title={Topic Modeling on Document Networks with Adjacent-Encoder},
    author={Zhang, Ce and Lauw, Hady W},
    booktitle={Thirty-fourth AAAI conference on artificial intelligence},
    year={2020}
}