Sujit-O / pykg2vec

Python library for knowledge graph embedding and representation learning.
MIT License
602 stars 109 forks source link

Load pretrained model without training, val, test data #206

Closed ck471 closed 3 years ago

ck471 commented 3 years ago

I trained a model (TransM) based on a custom Dataset and now I am trying to export it. Afiu I only need the -ldparameter with the model config and vec.pt. When I try to infer I cannot build or load the model since its complaining there is no train data available because it iss trying to build the KnowledgeGraph. Naturally, I do not want to export the model with all the training data. This behaviour is the same for the infernce.py example. If you load the pretrained model it will still download the data.

python inference.py -mn TransE -ld examples/pretrained/TransE
pykg2vec.data.datasets - INFO - Downloading the dataset FB15k

How can I accomplish loading the model independent of the train, test, val data?

To Reproduce

  1. Try to load according to the documentation
    
    args = ["-mn", "TransM", "-ds", "custom_dataset", 
        "-ld", "dataset/custom_dataset/intermediate/transm",
       "-dsp", "./dataset/custom_dataset"]
    args = KGEArgParser().get_args(args)

config_def, model_def = Importer().import_model_config(args.model_name.lower()) config = config_def(args)


This needs the training data
`NotImplementedError: /home/kgc/dataset/custom_dataset-train.txt training file not found!
`

2. I try to only specify the pretrained path

args = ["-mn", "TransM", "-ld", "dataset/custom_dataset/intermediate/transm",] args = KGEArgParser().get_args(args)

config_def, model_def = Importer().import_model_config(args.model_name.lower()) config = config_def(args)



This results in the default behaviour of assuming I operate on the Freebase15k dataset and the model metadata(ent,rel) are wrong.

**Expected behavior**
Based on the the files in the intermediate folder, I want to initiate a model and infer heads and tails

Thanks for the great repo!
baxtree commented 3 years ago

Hi, @ck471, My understanding is you were trying to conduct inference over the model pre-trained against your own dataset. When using custom datasets, this naming convention needs to be followed: say you named your dataset as "mydataset" and your ./dataset/custom_dataset folder needs to have mydataset-train.txt, mydataset-test.txt and mydataset-valid.txt ready. If you don't use your own hyperparameters yaml, you need to pass in the dataset name on both training and inferring. So try the followings and see if they work or not:

pykg2vec-train -mn TransM -ds mydataset -dsp ./dataset/custom_dataset pykg2vec-infer -mn TransM -ds mydataset -ld dataset/custom_dataset/intermediate/transm -dsp ./dataset/custom_dataset

ck471 commented 3 years ago

Hello @baxtree. Thank you for the quick response.

The inference works if I provide train/test/valid. I was wondering if I am able to infer entities and relations without train, test and valid after my model is trained. Looking at the code, the trainer class relies on the KnowledgeGraph, which needs train/test/valid. Same goes if i try to create a model object, for example a TransM(PairwiseModel) object.

Your help is much appreciated.

baxtree commented 3 years ago

Glad it works! The inference will still need the index-to-entity and index-to-relation mappings for returning the original labels of entities or relations rather than the index numbers. But you are right, the inference should not require users to pass in the path to the training dataset coz those mappings are already stored in the cache after training, see in idx2entity.pkl and idx2relation.pkl under ./dataset/custom_dataset.

I can smell a new improvement of removing the requirement on datasets when performing certain tasks. Any thoughts @louisccc ?

baxtree commented 3 years ago

Hi, @ck471 . Just let you know PR https://github.com/Sujit-O/pykg2vec/pull/209 has made other params redundant for inference and you will be able to do pykg2vec-infer -ld dataset/custom_dataset/intermediate/transm.