Closed maya-25 closed 2 years ago
The embeddings returned by fit-transform will be in the same order as the entities. However, by wrapping your entities in a set, you lose ordering. Instead, just pass list(entities) instead to be guaranteed the same order.
I tried by passing list(entities) or just entities (because it was already a list which is filtered entities): In both cases, It is throwing an error below.
Working, when passing list(set(entities) only.
IndexError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pyrdf2vec/rdf2vec.py in fit_transform(self, kg, entities, is_update) 141 """ 142 self._is_extract_walks_literals = True --> 143 self.fit(self.get_walks(kg, entities), is_update) 144 return self.transform(kg, entities) 145
/usr/local/lib/python3.7/dist-packages/pyrdf2vec/rdf2vec.py in get_walks(self, kg, entities) 180 181 self._update(self._entities, entities) --> 182 self._update(self._walks, walks) 183 184 if self.verbose >= 1:
/usr/local/lib/python3.7/dist-packages/pyrdf2vec/rdf2vec.py in _update(self, attr, values) 266 tmp = values 267 for i, pos in enumerate(self._pos_entities): --> 268 attr[pos] = tmp.pop(self._pos_walks[i]) 269 attr += tmp 270
IndexError: list assignment index out of range
I am a bit clueless as to what the error could be. Your triples.ttl seems extremely small to start, and many of the entities in your list are not even present in your turtle file? Could you perhaps share more complete data? Also, do the example scripts run for you?
Please find: URI_list.csv rdf_triples.txt
Passing file path as rdf_triples.ttl (I attach here as .txt as .ttl not supported here, please convert it to .ttl) and entities as URI_list.csv
from pyrdf2vec import RDF2VecTransformer
from pyrdf2vec.embedders import Word2Vec
from pyrdf2vec.graphs import KG
from pyrdf2vec.walkers import RandomWalker
def rdftriplestovec(filepath,entities):
kg = KG(filepath)
print(kg._entities)
#transformer = RDF2VecTransformer()
transformer = RDF2VecTransformer(walkers=[RandomWalker(3, None)],
embedder=Word2Vec(size=500))
entities_names=[entity.name for entity in kg._entities]
filtered_entities = [e for e in entities if e in entities_names]
print(filtered_entities)
not_found = set(entities) - set(filtered_entities)
print(f'{len(not_found)} entities not found in the KG!')
entities = list(set(filtered_entities)
print(entities)
# Get embeddings
embeddings = transformer.fit_transform(kg, entities)
return embeddings
from pyrdf2vec import RDF2VecTransformer
from pyrdf2vec.embedders import Word2Vec
from pyrdf2vec.graphs import KG
from pyrdf2vec.walkers import RandomWalker
import pandas as pd
def rdftriplestovec(filepath, entities):
kg = KG(filepath, fmt='turtle')
# Sets will change the order, go see this for yourself in a shell or notebook
entities = list(set([x.name for x in kg._entities]).intersection(entities))
transformer = RDF2VecTransformer(walkers=[RandomWalker(3, None)],
embedder=Word2Vec(vector_size=500))
embeddings = transformer.fit_transform(kg, entities)
return embeddings
entities = list(pd.read_csv('URI_list.csv')['0'].values)
rdftriplestovec('rdf_triples.ttl', entities)
Works like a charm. But as I warned, if you use set()
in python, the order will change! Try to avoid it (which I am not doing here), or store the result of converting to set() so that you can reconstruct the order.
Thanks a lot, It works! Just one question to clear me, Does it always produce embeddings of the type tuples instead of ndarray?
Yes, it returns two things: embeddings and literals. Both will be numpy arrays.
β Question
I have generated embeddings for RDF triples URIs (from DBpedia) using pyRDF2vec. When I am passing the list(set(entities)) in the transformer.fir_transform(), I am not sure about the sequence of order of embeddings generated by the pyRDF2vec transformer. Will these sequences or order affect the results when I will concatenate these rdf embeddings with sentence context embeddings while training the model?
`code:
def rdftriplestovec(filepath,entities):
Sample of rdf triples in .ttl file (the predicate is of owl type): (passing as filepath in rdftriplestovec function)
@prefix owl: http://www.w3.org/2002/07/owl# .
http://dbpedia.org/resource/AT&T owl:Ontology http://dbpedia.org/resource/Espionage, http://dbpedia.org/resource/Police .
http://dbpedia.org/resource/Actor owl:Ontology http://dbpedia.org/resource/Major, http://dbpedia.org/resource/Plea, http://dbpedia.org/resource/United_States .
http://dbpedia.org/resource/Actor_model owl:Ontology http://dbpedia.org/resource/Visibility .
http://dbpedia.org/resource/Advertising owl:Ontology http://dbpedia.org/resource/Indian_Americans .
http://dbpedia.org/resource/Afghan_National_Army owl:Ontology http://dbpedia.org/resource/Enemy .
http://dbpedia.org/resource/Ago,_Mie owl:Ontology http://dbpedia.org/resource/Haunt_(comics), http://dbpedia.org/resource/Human_back, http://dbpedia.org/resource/Jesus
sample: URI list which I get from DBpedia API for my dataset (passing as entities in function rdftriplestovec)
['http://dbpedia.org/resource/United_States_House_of_Representatives', 'http://dbpedia.org/resource/Australian_Democrats', 'http://dbpedia.org/resource/Aide-de-camp', 'http://dbpedia.org/resource/United_Kingdom', 'http://dbpedia.org/resource/Even_language', 'http://dbpedia.org/resource/James_Comey', 'http://dbpedia.org/resource/Letter_(message)', 'http://dbpedia.org/resource/Jason_Chaffetz', 'http://dbpedia.org/resource/Twitter', 'http://dbpedia.org/resource/Italian_language', 'http://dbpedia.org/resource/Robb_Flynn', 'http://dbpedia.org/resource/Hillary_Clinton', 'http://dbpedia.org/resource/Breitbart_News', 'http://dbpedia.org/resource/Truth', 'http://dbpedia.org/resource/Get_(divorce_document)', 'http://dbpedia.org/resource/Inactivated_vaccine', 'http://dbpedia.org/resource/India', 'http://dbpedia.org/resource/Single_(music)', 'http://dbpedia.org/resource/November_2017_Somalia_airstrike', 'http://dbpedia.org/resource/Identified', 'http://dbpedia.org/resource/Iranian_peoples', 'http://dbpedia.org/resource/Woman', 'http://dbpedia.org/resource/Fiction', 'http://dbpedia.org/resource/Unpublished_Story', 'http://dbpedia.org/resource/Stoning']