IBCNServices / pyRDF2Vec

🐍 Python Implementation and Extension of RDF2Vec
https://pyrdf2vec.readthedocs.io/en/latest/
MIT License
243 stars 49 forks source link

Generate embeddings for all entities in a graph #160

Open HeikoPaulheim opened 1 year ago

HeikoPaulheim commented 1 year ago

Maybe I'm too blind to see it, but is there a straightforward way to create embeddings for all entities in a graph? The entities argument in fit_transform is mandatory, and the _entities field in KG seems not to give me access to URIs of entities that have a label...

GillesVandewiele commented 1 year ago

No indeed, this is not directly supported indeed, but would be a good addition IMO (although this will typically take a lot of time to generate)

What does the _entities attribute return? I think it should be a list of Vertex objects of which you can retrieve the name to get the URI...

HeikoPaulheim commented 1 year ago

The name seems to contain the rdfs:label if there's any, and the URI only in case there's no label. An additional uri field in Vertex would already help doing the trick.

GillesVandewiele commented 1 year ago

Sorry for the late response! Agreed that this would be useful. Strange that name does not contain the URL by default however.

                for subj, pred, obj in rdflib.Graph().parse(
                    self.location, format=self.fmt
                ):
                    subj = Vertex(str(subj))
                    obj = Vertex(str(obj))

Is what it should do when you create a KG from disk. Not sure if rdflib has change but str() of a URIRef should normally return its URI?

Could you perhaps provide a minimal example where this issue occurs? I'll take a closer look in the nearby future to it.