JinheonBaek / GEN

Official Code Repository for the paper "Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction" (NeurIPS 2020)
https://arxiv.org/abs/2006.06648
57 stars 11 forks source link

How to get pre-trained embeddings for my own dataset with entities and relations #5

Closed AndDoIt closed 1 year ago

AndDoIt commented 2 years ago

Thanks for your excellent work on OOG link prediction. Could you please tell me how to get pre-trained embeddings for my own dataset with entities and relations?

JinheonBaek commented 2 years ago

Thank you for your interest, and sorry for replying late.

We used the DGL-KE library (https://github.com/awslabs/dgl-ke) to pre-train entities and relations in the knowledge graph. AFAIK, you only need to provide your own datasets consisting of triples to this library, which would give you the trained embeddings for entities and relations.

AndDoIt commented 2 years ago

Thanks very much for your reply! And I am sorry to have two more questions to bother you. 1、Since the supplementary material said you randomly sample the unseen entities with a relatively small amount of triplets, then divide the sampled unseen entities with associated triplets into meta-training/validation/test sets. I wonder if this means that the meta-training set is not entirely from the raw training set, while meta-validation/test sets are not entirely from raw validation/test sets respectively. 2、Since my own KG is smaller and sparser, could you please introduce more details with WN18RR or some related setting and training tips?

JinheonBaek commented 1 year ago

Thanks for your question, and sorry for getting back late.

  1. That means, we first sample entities appearing less than x times, and then divide them into (meta-) training, validation, and test sets. You can simply think, in meta-learning, the training set is the same as the meta-training set.
  2. Based on my own knowledge of KGs, it was a bit important to tune the margin hyperparameter in the triplet loss.
AndDoIt commented 1 year ago

Thanks for your patient reply extremely, and I got it! And would you please introduce the steps to adjust the baseline models including Gmatching, FSRL and MetaR, so that they can be compared with GEN, since these baselines divide the dataset with sparse relations rather than entities. How to adjust them so that the four models have the same input, and they are comparable with each other?

JinheonBaek commented 1 year ago

Thanks for your follow-up questions!

We just simply used the model architectures of baselines with our own data splits. For example, in the seen to unseen category of Gmatching, the model predicts the unseen relation as implemented in the paper; meanwhile, in the ours category, we meta-train it in our framework to predict the unseen entity.

AndDoIt commented 1 year ago

Thanks for your great work and your reply!!!