Sujit-O / pykg2vec

Python library for knowledge graph embedding and representation learning.
MIT License
602 stars 109 forks source link

Train only once and test it multiple times #208

Closed mscsedu closed 3 years ago

mscsedu commented 3 years ago

Is it possible that I train the model only once and test the model on test data multiple times to get the score of the hit and MR? I want to the test results from the model on different testing datasets. when I load the model using -ld dataset/custom_dataset/intermediate/transe it starts the training from the beginning. I want just to load the model and test it on test data and get the score of the hit.

Thanks

baxtree commented 3 years ago

Hi, @mscsedu .Just found that it is possible to "train the model only once and test the model" with the same dataset:

  from pykg2vec.common import Importer, KGEArgParser
  from pykg2vec.utils.trainer import Trainer

  args = KGEArgParser().get_args(["-ld", "dataset/custom_dataset/intermediate/transe"])
  config_def, model_def = Importer().import_model_config(args.model_name.lower())
  config = config_def(args)
  model = model_def(**config.__dict__)
  trainer = Trainer(model, config)
  trainer.build_model()
  trainer.model.eval()
  trainer.evaluator.full_test(1)

I am not sure you can "test results from the model on different testing datasets". Within the testing process, the embedding will be extracted according to entity and relation IDs. Two datasets may have completely different sets of entities and relations or have entities and relations in common but with different assigned IDs so the model trained on one set may not have embeddings for another. Nonetheless, it works for the case such as fb15k (for training) and fb15k-237 (for testing).

mscsedu commented 3 years ago

Hi @baxtree I really appreciate your response. I have trained Rescal model on a custom dataset and model is saved in the intermediate directory. when I run your script, this gives an error about size mismatch for ent_embeddings.weight. Do you have any idea how to deal with it?

/content/drive/MyDrive/CIPL/pykg2vec-master/examples Traceback (most recent call last): File "test.py", line 9, in <module> trainer.build_model() File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.52-py3.6.egg/pykg2vec/utils/trainer.py", line 106, in build_model self.load_model(self.config.load_from_data) File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.52-py3.6.egg/pykg2vec/utils/trainer.py", line 427, in load_model self.model.load_state_dict(torch.load(str(model_path_file))) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for TransE: Missing key(s) in state_dict: "rel_embeddings.weight". Unexpected key(s) in state_dict: "rel_matrices.weight". size mismatch for ent_embeddings.weight: copying a param with shape torch.Size([23659, 50]) from checkpoint, the shape in current model is torch.Size([14951, 50]).

baxtree commented 3 years ago

...RuntimeError: Error(s) in loading state_dict for TransE... Somehow it was hooking up to TransE. Thus, passing in the model name should get rid of that error: args = KGEArgParser().get_args(["-mn", "Rescal", "-ld", "dataset/custom_dataset/intermediate/rescal"])

The user interface for adopting a pre-trained model is still far from perfect tbh. The model name could be inferred based on the config dump or the folder name and should not be default to TransE.

mscsedu commented 3 years ago

Thank @baxtree It is working now. I appreciate your efforts for this great repository. Do you have any plans to incorporate path queries in your pykg2vec library? Guu, Kelvin, John Miller, and Percy Liang. "Traversing knowledge graphs in vector space." arXiv preprint arXiv:1506.01094 (2015).

baxtree commented 3 years ago

Oh interesting... Will have a look. Attached their implementation in here for future reference.

Btw, https://github.com/Sujit-O/pykg2vec/pull/209 has improved the interface. E.g., you can run pykg2vec-test -ld dataset/custom_dataset/intermediate/rescal to do a full test without using the earlier code snippet.

mscsedu commented 3 years ago

great thank you