Sujit-O / pykg2vec

Python library for knowledge graph embedding and representation learning.
MIT License
603 stars 109 forks source link

Issue with baesian optimizer for custom dataset. #175

Closed Rodrigo-A-Pereira closed 4 years ago

Rodrigo-A-Pereira commented 4 years ago

When using Bayesian Optimizer (BaysOptimizer) with a custom dataset I'm having problems with a "ValueError: Unknown dataset: [ds name]" error. According to the Traceback, the error is thrown from the "kgcontroller.py" module (line 160). The only hypothesis I had was that the custom dataset_path variable was Null, which I ended up confirming.

This is the snippet i use for the calling the optimizer:

args = KGETuneArgParser().get_args(['-mn', model_name, '-ds', dataset_name, "-dsp", dataset_path])
bays_opt = BaysOptimizer(args=args)
bays_opt.optimize()
best = bays_opt.return_best()

I already verified both the dataset_name and dataset_path exist since i use them in the KG creation and it works just fine:

args = KGEArgParser().get_args(['-mn', model_name, '-ds', dataset_name, "-dsp", dataset_path])
knowledge_graph = KnowledgeGraph(dataset=args.dataset_name, custom_dataset_path=args.dataset_path)

After diving bit into the code, it seemed to me that on the "bayesian_optimizer.py" module, the kge_args object passed to the config_obj is not given the path for the dataset (only the name) on line 51. Which seems that in consequence not passing it to its KnowlegeGraph object, raising the error.

Note: Adding this to the BaesianOptimizer init function seems to solve the issue for me.

Full Traceback:

Traceback (most recent call last):
  File "train.py", line 104, in <module>
    main()
  File "train.py", line 72, in main
    bays_opt = BaysOptimizer(args=args)
  File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.51-py3.6.egg/pykg2vec/utils/bayesian_optimizer.py", line 51, in __init__
    self.config_local = self.config_obj(self.kge_args)
  File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.51-py3.6.egg/pykg2vec/config.py", line 65, in __init__
    self.knowledge_graph = KnowledgeGraph(dataset=args.dataset_name, custom_dataset_path=args.dataset_path)
  File "/usr/local/lib/python3.6/dist-packages/pykg2vec-0.0.51-py3.6.egg/pykg2vec/data/kgcontroller.py", line 160, in __init__
    raise ValueError("Unknown dataset: %s" % dataset)
ValueError: Unknown dataset: GOT
louisccc commented 4 years ago

Hey, Rodrigo This is a bug. Thanks for finding this out, we will submit a PR to resolve this asap.

ArkDu commented 4 years ago

Hi Rodrigo_A_Pereira, We tested and confirmed the issue. We have already submitted a PR, which aims at solving this issue.

One thing I noticed from your response is that you used KGETuneArgParser, which has been removed from our current version of the code. We combined all functionalities into KGEArgParser, so instead of:

args = KGETuneArgParser().get_args(['-mn', model_name, '-ds', dataset_name, "-dsp", dataset_path])

you can try:

args = KGEArgParser().get_args(['-mn', model_name, '-ds', dataset_name, "-dsp", dataset_path])

which should work just fine. We are currently refactoring and simplifying our function calls to make them more user friendly, and unfortunately we made some mistakes during the process. Please refer to our newest version of code for more information.

Please let me know if you have more questions.

Rodrigo-A-Pereira commented 4 years ago

Thank you for the reply!