dice-group / dice-embeddings

Hardware-agnostic Framework for Large-scale Knowledge Graph Embeddings
MIT License
48 stars 13 forks source link

Some entity's embedding missing #41

Closed umairq closed 2 years ago

umairq commented 2 years ago

Some entities return null when I use pre_trained_kge.get_entity_embeddings(['https://dbpedia.org/resource/Abraham_Lincoln']). "Abraham Lincoln" is one of a major entity in DBpedia, and it returns null. It has a DBpedia and Wikipedia page available. Can you please fix it? thanks

Demirrr commented 2 years ago

Hello @umairq ,

I assume that the missing entity issue is related to the example provided in https://github.com/dice-group/dice-embeddings#using-pre-trained-conex-on-dbpedia-03-2022 It might be the case that some entity embeddings are missing. In this document, we elucidated each step taking during preprocessing DBpedia 03-2022.

Cheers

umairq commented 2 years ago

Thanks @Demirrr Yes I read the documentation. But I think this entity (i.e., https://dbpedia.org/resource/Abraham_Lincoln' ) does not come under the pre-processing removal criteria. As It is not a literal or any other criteria mentioned in document. Can you please point out exactly why it is excluded? And as I am naive in embeddings domain and this project in particular, can you please provide the very basic steps to re-train it on a newer version of DBpedia? (As I have many examples of such missing entities)

Demirrr commented 2 years ago

Sorry for the late response.

But I think this entity (i.e., https://dbpedia.org/resource/Abraham_Lincoln' ) does not come under the pre-processing removal criteria. As It is not a literal or any other criteria mentioned in document. Can you please point out exactly why it is excluded?

Unfortunately I do not have time atm to detect at which step of the preprocessing the mentioned entity is removed.

can you please provide the very basic steps to re-train it on a newer version of DBpedia? (As I have many examples of such missing entities)

We have ContinuousExecute class implemented to continue the training process by using a pre-trained model. Yet, given that you need embedding vectors for entities haven't previously seen, this class would not be helpful. I am afraid, you would need to construct a training dataset containing all triples of your interest and train a model a desired hyper params.

At the time being, DBpedia 2022-03 is the newest version of DBpedia (see https://www.dbpedia.org/blog/dbpedia-snapshot-2022-03-release/)