Accenture / AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
Apache License 2.0
2.12k stars 252 forks source link

Support Predicting Unseen Entities #82

Open chanlevan opened 5 years ago

chanlevan commented 5 years ago

Background and Context GNN in Knowledge Transfer for Out-of-Knowledge-Base Entities: A graph Neural Network Approach supports predicting unseen entities. We should have this model. Description

mmercierTh commented 5 years ago

Here at Thales, we are currently trying to use this upcoming feature directly from the branch. We were previously using 1.0.2 and are facing some problems. We saw that loss function multiclass_nll that we were using in our ComplEx saved model does not seem to be supported anymore, is it correct? This may impact our current work, so would be great to get clarification on this.

Also, we tried the unseen entity example which seems to work if you use the model right after using the fit function. In our workflow we prefer to save the model on disk and then restore for later use. Unfortunately the feature branch code will fail with the following error: AttributeError: 'ComplEx' object has no attribute 'ent_emb'. Any plan to fix this?

The sample code use is attached. unseen_test.zip

Thanks

sumitpai commented 5 years ago

multiclass_nll would be supported in our releases.

The problem is that this feature branch(feature/#82) was branched out from an older version of AmpliGraph (which didn't not have the multiclass_nll loss) and doesn't contain the later changes that have been done on the master branch. As you rightly pointed out, we will try and integrate the changes on master branch into this branch (to keep it up to date).

This feature is not complete, so you may notice some issues (like you mentioned about fit-predict), but once complete it would be merged to develop.

mmercierTh commented 5 years ago

Thanks for the clarification. What would be the ETA for completing this feature. From what I understand it is expected to be in milestone 1.2 (due august 8th?).

chanlevan commented 5 years ago

Hi @mmercierTh

I have pushed some changes to feature/82. The restored model can predict unseen entities and multiclass-loss is also supported. Because there is some changes from the multiclass_nll version, your example code should now look like:



from ampligraph.latent_features import ComplEx
from ampligraph.utils.model_utils import save_model as ampligraph_save_model
from ampligraph.utils.model_utils import restore_model as ampligraph_restore_model
import os

model = ComplEx(batches_count=2, seed=555, epochs=20, k=10, loss="multiclass_nll")

X = np.array([["a", "y", "b"],
            ["b", "y", "a"],
            ["a", "y", "c"],
            ["c", "y", "a"],
            ["a", "y", "d"],
            ["c", "y", "d"],
            ["b", "y", "c"],
            ["f", "y", "e"]])

model.fit(X)

saved_name = "./model.pkl"

ampligraph_save_model(model, model_name_path=saved_name)
restored_model = ampligraph_restore_model(model_name_path=saved_name)

print(model.predict(np.array(["z", "y", "f"]), approximate_unseen={
        "pool": "avg",
        "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
}))

print(restored_model.predict(np.array(["z", "y", "f"]), approximate_unseen={
        "pool": "avg",
        "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
}))```
chanlevan commented 5 years ago

What we have done in this moment is the baseline in the paper which the unseen vector is approximated from its neighbors using average, max or sum metric. We are expecting to have the Hamaguchi model implemented in version 1.2 with full intergrated tests and documentation.

mmercierTh commented 5 years ago

Great thanks, we tested the update and it is only working for a single prediction. but not for 2 or more consecutive prediction from a model that is loaded once. Is there any method we should call to reset the model before each prediction? The error is below with a sample code

Traceback (most recent call last):
  File "/home/user/src/NER/graph_embeddings/unseen_test.py", line 29, in <module>
    "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 2079, in predict
    approximate_unseen={**approximate_unseen, "k_size": 2 * self.k})
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 1088, in predict
    X, e, app_embs = self._assign_unseen_idx(approximate_unseen)(to_idx)(X, ent_to_idx=self.ent_to_idx, rel_to_idx=self.rel_to_idx)
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 949, in inner_dec
    k_size=approximate_unseen["k_size"])
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 1015, in _approximate_embeddings
    neighbour_vectors = self.get_embeddings(N_ent, embedding_type='entity')
  File "/home/user/.local/share/virtualenvs/graph_embeddings-RjobSHoH/lib/python3.6/site-packages/ampligraph/latent_features/models.py", line 395, in get_embeddings
    return emb_list[idxs]
IndexError: index 6 is out of bounds for axis 0 with size 6

sample code:


for i in range(2):
    print(model.predict(np.array(["z", "y", "f"]), approximate_unseen={
            "pool": "avg",
            "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
    }))

for i in range(2):
    print(restored_model.predict(np.array(["z", "y", "f"]), approximate_unseen={
            "pool": "avg",
            "neighbour_triples": [["z", "y", "c"],["z", "y", "d"]]
    }))```
chanlevan commented 5 years ago

The bug has been fixed and new code has been updated to feature/82

mmercierTh commented 5 years ago

Great it is now working fine now . Thanks for your responsiveness on this issue it is greatly appreciated. Quick question related to the implementation. How exactly do you calculate the embedding of the unseen entity. Thanks

chanlevan commented 5 years ago

We collect all the neighbour entities of that unseen entity, get their corresponding trained vectors, average, take the max or sum those. You can find the details of the implementation in line 997 of file ampligraph/latent_features/models.py, function def _approximate_embeddings. Hope this helps.

plgregoire commented 5 years ago

Hi,

That is what we figure out when we looked at the code. We wanted to be sure that we could use this code to get the approximate embedding of the unknown entity

neighbour_vectors = model.get_embeddings(["c", "d"])
approximate_embedding_of_z = numpy.np.mean(neighbour_vectors)

So the embedding of z is calculated from the average of its neighbors c and d.

I find this way of approximating z a little bit odd since it is an average of all its connected entities(neighbors) instead of being a average of all entities that have the same neighbors than z.

Is there something I'm not understanding ? Could you enlighten me on that ?

chanlevan commented 5 years ago

We were targeting implement the model in the paper Knowledge Transfer for Out-of-Knowledge-Base Entities: A graph Neural Network Approach. But at the moment the implementation is not done yet, so we just support the baseline. In the page number 6 of this paper, it describes the baseline of how to approximate the unseen entity, and we used this approach.

image

plgregoire commented 5 years ago

Thank you for the explanation and your responsiveness We will continue to follow the development of this feature

captify-dieter commented 4 years ago

Seems to break when filter_unseen_entities is set to True as no approximate_unseen is given and the var unseen is used before declaration in the line:

self._add_app_embs(unseen, app_embs)(self._load_model_from_trained_params)()

sumitpai commented 4 years ago

Thanks for try out this feature. This issue is under development, so it may be unstable. As described here only the baselines have been implemented but the Hamaguchi model is not yet implemented.

We are targeting this feature for 1.2 release.

captify-dieter commented 4 years ago

I meant even without using a method, it seems to break because of conflicts with code structure further down. Before using any method, I wanted to get a baseline when removing unseen entities, using the same feature branch (i.e. make sure all entities are seen, and do not pass anything as approximate_unseen). Will submit a PR if I re-structure it and fix this.

lukostaz commented 4 years ago

This will be addressed in long-term release 2.1.