Closed chenxshuo closed 2 years ago
Hello @chenxshuo, thank you for your interest in our work!!
In a nutshell the way that we implement post-training is as follows. For each model, we have a Kelpie implementation extends both a generic KelpieModel and the original model; for example, this is the KelpieConve implementation, that extends both KelpieModel and ConvE.
At creation, the Kelpie model copies the lists embeddings of the original model via pytorch clone().detach().
Then it generates the mimic embedding, that is created as a new Parameter with requires_grad
set to True
and stored in the self.kelpie_entity_embedding
variable. The mimic embedding is then added to the list of (otherwise frozen) entity embeddings.
The mimic entity is set to have id equal to the number of all original entities in the dataset + 1, so we also create a KelpieDataset object that clones the facts of the original entity replacing the id of the original entity with the id of the mimic and temporarily adds them to the dataset.
Then the KelpieModel copies all the shared parameters in the original model, and sets their requires_grad
flag to false.
Due to how PyTorch is implemented this is not enough to run post-training smoothly. If we just trained the model like this, the self.kelpie_entity_embedding
embedding would be updated after every step, but the version appended to the entity embeddings list would not be updated. This would be a huge problem: the version appended to the entity embeddings list is the one used to compute the scores and thus the gradient! If it remains constant, the computed gradient would remain constant as well, and the updates sent to self.kelpie_entity_embedding
would not make sense at all.
We solve this problem in a simple way: after each step, we update the version in the list by overwriting it with the content of self.kelpie_entity_embedding
. This happens in the code line that you highlighted.
As you observed, the KelpieOptimizers do not really have logic for the freeze and post-training.
In general, the only difference they have from their original counterparts is to call the KelpieModel update_embeddings()
method to propagate the step update from self.kelpie_entity_embedding
to the embedding in the embeddings list.
I hope this clarifies your doubts!
Wow, awesome 😄 , thank you @AndRossi for your immediate and informative response!
That helps a lot and now I understand the mechanism better. One question though, you mentioned that:
Then the KelpieModel copies all the shared parameters in the original model, and sets their
requires_grad flag
to false.
I do find this setting in the convE.py for some auxiliary layers but for embeddings including in other KGE models, it seems like they are still updated and require gradient? Or you set them elsewhere or by default the requires_grad
is False
?
Ahah you're welcome @chenxshuo !
The original embeddings are copied from the original model to the kelpie model using .clone().detach()
: by doing this, PyTorch considers them as constants, so their gradient is disabled.
Naruhodo! Now I understand. Thanks again! 😃
Hello, Andrea~ thank you for your excellent work! Currently, I am learning your paper and codes. I am wondering how or where you freeze all the other parameters during post-training. I checked the codes, especially in the optimizer class . It seems like all embeddings are updated during post-training?
Looking forward to your reply!