Closed mscsedu closed 3 years ago
Hi, @mscsedu .Just found that it is possible to "train the model only once and test the model" with the same dataset:
from pykg2vec.common import Importer, KGEArgParser
from pykg2vec.utils.trainer import Trainer
args = KGEArgParser().get_args(["-ld", "dataset/custom_dataset/intermediate/transe"])
config_def, model_def = Importer().import_model_config(args.model_name.lower())
config = config_def(args)
model = model_def(**config.__dict__)
trainer = Trainer(model, config)
trainer.build_model()
trainer.model.eval()
trainer.evaluator.full_test(1)
I am not sure you can "test results from the model on different testing datasets". Within the testing process, the embedding will be extracted according to entity and relation IDs. Two datasets may have completely different sets of entities and relations or have entities and relations in common but with different assigned IDs so the model trained on one set may not have embeddings for another. Nonetheless, it works for the case such as fb15k (for training) and fb15k-237 (for testing).
Hi @baxtree I really appreciate your response. I have trained Rescal model on a custom dataset and model is saved in the intermediate directory. when I run your script, this gives an error about size mismatch for ent_embeddings.weight. Do you have any idea how to deal with it?
/content/drive/MyDrive/CIPL/pykg2vec-master/examples Traceback (most recent call last): File "test.py", line 9, in <module> trainer.build_model() File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.52-py3.6.egg/pykg2vec/utils/trainer.py", line 106, in build_model self.load_model(self.config.load_from_data) File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.52-py3.6.egg/pykg2vec/utils/trainer.py", line 427, in load_model self.model.load_state_dict(torch.load(str(model_path_file))) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for TransE: Missing key(s) in state_dict: "rel_embeddings.weight". Unexpected key(s) in state_dict: "rel_matrices.weight". size mismatch for ent_embeddings.weight: copying a param with shape torch.Size([23659, 50]) from checkpoint, the shape in current model is torch.Size([14951, 50]).
...RuntimeError: Error(s) in loading state_dict for TransE
... Somehow it was hooking up to TransE. Thus, passing in the model name should get rid of that error:
args = KGEArgParser().get_args(["-mn", "Rescal", "-ld", "dataset/custom_dataset/intermediate/rescal"])
The user interface for adopting a pre-trained model is still far from perfect tbh. The model name could be inferred based on the config dump or the folder name and should not be default to TransE.
Thank @baxtree It is working now. I appreciate your efforts for this great repository. Do you have any plans to incorporate path queries in your pykg2vec library? Guu, Kelvin, John Miller, and Percy Liang. "Traversing knowledge graphs in vector space." arXiv preprint arXiv:1506.01094 (2015).
Oh interesting... Will have a look. Attached their implementation in here for future reference.
Btw, https://github.com/Sujit-O/pykg2vec/pull/209 has improved the interface. E.g., you can run pykg2vec-test -ld dataset/custom_dataset/intermediate/rescal
to do a full test without using the earlier code snippet.
great thank you
Is it possible that I train the model only once and test the model on test data multiple times to get the score of the hit and MR? I want to the test results from the model on different testing datasets. when I load the model using -ld dataset/custom_dataset/intermediate/transe it starts the training from the beginning. I want just to load the model and test it on test data and get the score of the hit.
Thanks