Predicting the object - Githubissues

Accenture / AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org

Apache License 2.0

2.16k stars 250 forks source link

Predicting the object #198

Closed lavdim closed 4 years ago

lavdim commented 4 years ago

Hi,

the current documentation shows examples how to retrieve scoring and probabilities for a given set of unseen triples. However, would it be possible for a given set of subjects and predicates to predict objects with scoring and probabilities associated, respectively.

NicholasMcCarthy commented 4 years ago

Hi lavdim,

Yes!

The following code snippet should help (NB: I didn't test it live but should work for the most part):


# The given scores and predicates - note this assumes that they are not converted to the internal indices 
subjects = ['a', 'b', 'c']
predicates = ['p1', 'p2']

# Gets all entities in graph embedding - if the given subject and predicates are already internal indices, change the keys() to .values()
objects = list(model.ent_to_idx.keys()) 

# Construct all combinations of given subject, predicate and objects 
X = np.array(np.meshgrid(subjects, predicates, objects)).T.reshape(-1, 3)

# Score X 
scores = model.predict(X)

# Select triples above a given threshold 
X_true = X[np.where(scores > 0.5)]

Hope this helps :)

lavdim commented 4 years ago

Hi Nicholas,

thanks a lot for your hints.

I assume that would work for a use case like below:

Input dataset: s p o [[a type b], [a knows c], [ c type b], [z knows y], [z type b], [x knows y]] after splitting, training, ...

I like to predict the following: s p [[x type], [y knows]]

I will try out and come back if I still have any questions.

NicholasMcCarthy commented 4 years ago

Lavdim,

I'm afraid I actually forgot part of the API (embarrassing because I wrote it).

The query_topn function in the discovery module is the simplest way to achieve what you asked. Documentation here.


from ampligraph.discovery import query_topn
model.fit(X)
results = query_topn(model, top_n=5, head='subject', relation='predicate')

lavdim commented 4 years ago

Hi Nicholas,

I see, this seems to be more straightforward. I will try this way.

Thanks for your support.