Language Models as Knowledge Embeddings

Source code for the paper Language Models as Knowledge Embeddings

Notice

[June 2023] We recently identified a data leakage issue in our code that, during prediction, we inadvertently leaked degree information about the entities to be predicted. This unintentionally provided a shortcut for the model, which affected the experimental results to some extent. We have fixed this issue and re-conducted our experiments, and updated the paper accordingly. The revised results do not impact the majority of the paper's conclusions and contributions. The method continues to achieve state-of-the-art (SOTA) performance on the WN18RR, FB13, and WN11 datasets, compared to previous works. However, on the FB15k-237 dataset, the model's performance has declined to a certain extent and underperforms state-of-the-art structured-based methods. We sincerely apologize for this error.

Updated Results:

WN18RR

Methods	MR	MRR	Hits@1	Hits@3	Hits@10
TransE	2300	0.243	0.043	0.441	0.532
DistMult	5110	0.430	0.390	0.440	0.490
ComplEx	5261	0.440	0.410	0.460	0.510
RotatE	3340	0.476	0.428	0.492	0.571
TuckER	-	0.470	0.443	0.482	0.526
HAKE	-	0.497	0.452	0.516	0.582
CoKE	-	0.484	0.450	0.496	0.553
----------------------------------------	----	-----	--------	--------	---------
Pretrain-KGE_TransE	1747	0.235	-	-	0.557
KG-BERT	97	0.216	0.041	0.302	0.524
StAR_BERT-base	99	0.364	0.222	0.436	0.647
MEM-KGCBERT-base(w/o EP)	-	0.533	0.473	0.570	0.636
MEM-KGCBERT-base(w/ EP)	-	0.557	0.475	0.604	0.704
C-LMKE_BERT-base	79	0.619	0.523	0.671	0.789

FB15k-237

Methods	MR	MRR	Hits@1	Hits@3	Hits@10
TransE	323	0.279	0.198	0.376	0.441
DistMult	254	0.241	0.155	0.263	0.419
ComplEx	339	0.247	0.158	0.275	0.428
RotatE	177	0.338	0.241	0.375	0.533
TuckER	-	0.358	0.266	0.394	0.544
HAKE	-	0.346	0.250	0.381	0.542
CoKE	-	0.364	0.272	0.400	0.549
----------------------------------------	----	-----	--------	--------	---------
Pretrain-KGE_TransE	162	0.332	-	-	0.529
KG-BERT	153	-	-	-	0.420
StAR_BERT-base	136	0.263	0.171	0.287	0.452
MEM-KGCBERT-base(w/o EP)	-	0.339	0.249	0.372	0.522
MEM-KGCBERT-base(w/ EP)	-	0.346	0.253	0.381	0.531
C-LMKE_BERT-base	141	0.306	0.218	0.331	0.484

Requirements

PyTorch version >= 1.7.1
NumPy version >= 1.19.5
transformers
tqdm
Python version >= 3.6

Usage

Run main.py to train or test our models.

An example for training for triple classification:

python main.py --batch_size 16 --plm bert --data wn18rr --task TC

An example for training for link prediction:

python main.py --batch_size 16 --plm bert --contrastive --self_adversarial --data wn18rr --task LP

The arguments are as following:

--bert_lr: learning rate of the language model.
--model_lr: learning rate of other parameters.
--batch_size: batch size used in training.
--weight_decay: weight dacay used in training.
--data: name of the dataset. Choose from 'fb15k-237', 'wn18rr', 'fb13' and 'umls'.
--plm: choice of the language model. Choose from 'bert' and 'bert_tiny'.
--load_path: path of checkpoint to load.
--load_epoch: load the checkpoint of a specific epoch. Use with --load_metric.
--load_metric: use with --load_epoch.
--link_prediction: run link prediction evaluation after loading a checkpoint.
--triple_classification: run triple classification evaluation after loading a checkpoint.
--self_adversarial: use self-adversarial negative sampling for efficient KE learning.
--contrastive: use contrastive LMKE.
--task: specify the task. Choose from 'LP' (link prediction) and 'TC' (triple classification).

Datasets

The datasets are put in the folder 'data', including fb15k-237, WN18RR, FB13 and umls.

Neph0s / LMKE