Sujit-O / pykg2vec

Python library for knowledge graph embedding and representation learning.
MIT License
601 stars 109 forks source link

Better way to manage parameter_list, maybe without NamedEmbedding #211

Open dany-nonstop opened 3 years ago

dany-nonstop commented 3 years ago

First I want to thank you for making this pykg2vec project! Its comprehensive inclusion of many SOTA algorithms along the development history of knowledge graph embedding is really impressive!

I am trying to revise an existing model by adding a new linear layer, and of course I just added the new layer into the parameter_list. All training went through pretty well, but an exception occurred in export_embedding(), where every parameter in the list need to be have a name, or is assumed to be a NamedEmbedding.

https://github.com/Sujit-O/pykg2vec/blob/492807b627574f95b0db9e7cb9f090c3c45a030a/pykg2vec/utils/trainer.py#L462-L471

I read the code and realized that pykg2vec is using NamedEmbedding for every parameter that is even not embedding at all. For example these parameters in SME model have to be NamedEmbeddings, though they are not embeddings at all.

https://github.com/Sujit-O/pykg2vec/blob/492807b627574f95b0db9e7cb9f090c3c45a030a/pykg2vec/models/pairwise.py#L573-L578

And because of this unnecessary detour to NamedEmbedding, they are later used in a little obscured way like this

https://github.com/Sujit-O/pykg2vec/blob/492807b627574f95b0db9e7cb9f090c3c45a030a/pykg2vec/models/pairwise.py#L627-L629

I am sure you must have a reason to devise such a mechanism in pykg2vec, would you please elaborate, or just use pytorch's way to handle parameters with register_parameter() ? I assume it may simplify the code base, and make it easier to understand / extend?

@baxtree @louisccc

baxtree commented 3 years ago

Hi, @dany-nonstop, Thanks for the suggestion and I think you are right in say parameter_list is not the torch way and can be superseded by register_parameter(). The torch version of pykg2ves was "translated" straightly based off its tensorflow version and at then, there was probably no concept of named embedding in torch hence the NamedEmbedding. Do you mind submitting a PR, assuming you may have done some exploratory change and the tests are still passing?

dany-nonstop commented 3 years ago

Hi, @baxtree thanks for shedding light on the origin of the code. while going through the code I noticed that parameter_list is only saved in export_embedding but was never reused elsewhere. Even in during inference in pykg2vec-infer the Trainer is directly loading the saved state_dict instead of using the saved embeddings in tsv files.

I'm not sure how useful those tsv files are, and am wondering if the code related to it should be completely removed instead. I'd love to contribute to pykg2vec, but need to make sure I know what I'm doing before making any changes. 😆

baxtree commented 3 years ago

Hi, @louisccc . Can you pls share some history about the saved embeddings as mentioned above and guide @dany-nonstop through the contributing process?