IBCNServices / pyRDF2Vec

🐍 Python Implementation and Extension of RDF2Vec
https://pyrdf2vec.readthedocs.io/en/latest/
MIT License
243 stars 49 forks source link

access to remote sparql endpoint #31

Closed dinani65 closed 3 years ago

dinani65 commented 3 years ago

Could you explain how to use it when we want to access some sparql endpoints such as Yago ("https://yago-knowledge.org/sparql")? It faces this error: ~/.local/lib/python3.8/site-packages/pyrdf2vec/graphs/kg.py in _get_shops(self, vertex) 113 results = self.endpoint.query().convert() 114 neighbors = [] --> 115 for result in results["results"]["bindings"]: 116 predicate, obj = result["p"]["value"], result["o"]["value"] 117 if predicate not in self.label_predicates:

TypeError: byte indices must be integers or slices, not str

Furthermore, access to endpoints is so slow and when using the files of datasets also needs high memory, so how to handle this problem? Is there any way to access a graph database? if so, how?

GillesVandewiele commented 3 years ago

It seems that results["results"] does not contain what is expected. Could you print it out?

dinani65 commented 3 years ago

from pyrdf2vec.graphs import KG import pandas as pd import rdflib as rdflib from pyrdf2vec import RDF2VecTransformer from pyrdf2vec.walkers import RandomWalker

wiki_data = pd.read_csv('ds.csv') entities = yago_data['actor'] label_predicates = ['http://schema.org/actor',]

kg = KG("https://yago-knowledge.org/sparql", is_remote=True, label_predicates=[rdflib.URIRef(x) for x in label_predicates])

transformer = RDF2VecTransformer(walkers=[RandomWalker(2, 5)]) walk_embeddings = transformer.fit(kg, entities, verbose=True).transform(entities)

################## error:

TypeError Traceback (most recent call last)

in 5 transformer = RDF2VecTransformer(walkers=[RandomWalker(2, 5)]) 6 ----> 7 walk_embeddings = transformer.fit(kg, entities, verbose=True).transform( 8 entities 9 ) ~/.local/lib/python3.8/site-packages/pyrdf2vec/rdf2vec.py in fit(self, kg, entities, verbose) 69 70 for walker in self.walkers: ---> 71 self.walks_ += list(walker.extract(kg, entities)) 72 corpus = [list(map(str, x)) for x in self.walks_] 73 ~/.local/lib/python3.8/site-packages/pyrdf2vec/walkers/walker.py in extract(self, kg, instances) 52 """ 53 self.sampler.fit(kg) ---> 54 return self._extract(kg, instances) 55 56 @abc.abstractmethod ~/.local/lib/python3.8/site-packages/pyrdf2vec/walkers/random.py in _extract(self, kg, instances) 98 canonical_walks = set() 99 for i, instance in enumerate(instances): --> 100 walks = self.extract_random_walks(kg, instance) 101 for walk in walks: 102 canonical_walk = [] ~/.local/lib/python3.8/site-packages/pyrdf2vec/walkers/random.py in extract_random_walks(self, kg, root) 78 if self.walks_per_graph is None: 79 return self.extract_random_walks_bfs(kg, root) ---> 80 return self.extract_random_walks_dfs(kg, root) 81 82 def _extract( ~/.local/lib/python3.8/site-packages/pyrdf2vec/walkers/random.py in extract_random_walks_dfs(self, graph, root) 54 while d // 2 < self.depth: 55 last = d // 2 == self.depth - 1 ---> 56 hop = self.sampler.sample_neighbor(graph, new, last) 57 if hop is None: 58 break ~/.local/lib/python3.8/site-packages/pyrdf2vec/samplers/sampler.py in sample_neighbor(self, kg, walk, last) 51 not_tag_neighbors = [ 52 x ---> 53 for x in kg.get_hops(walk[-1]) 54 if (x, len(walk)) not in self.visited 55 ] ~/.local/lib/python3.8/site-packages/pyrdf2vec/graphs/kg.py in get_hops(self, vertex) 143 def get_hops(self, vertex: str) -> List[Tuple[str, str]]: 144 if self.is_remote: --> 145 return self._get_shops(vertex) 146 return self._get_rhops(vertex) 147 ~/.local/lib/python3.8/site-packages/pyrdf2vec/graphs/kg.py in _get_shops(self, vertex) 113 results = self.endpoint.query().convert() 114 neighbors = [] --> 115 for result in results["results"]["bindings"]: 116 predicate, obj = result["p"]["value"], result["o"]["value"] 117 if predicate not in self.label_predicates: TypeError: byte indices must be integers or slices, not str
GillesVandewiele commented 3 years ago

From "https://yago-knowledge.org/sparql":

YAGO can be queried by a SPARQL API. The SPARQL endpoint URI is https://yago-knowledge.org/sparql/query. There is a 1 minute timeout to ensure a responsive SPARQL endpoint for everyone. You can also fire a SPARQL query directly in the field below. If you plan to launch several queries, please download YAGO instead.

Use https://yago-knowledge.org/sparql/query

dinani65 commented 3 years ago

When I use the downloaded file, the kernel wants to be restarted ("The kernel appears to have died. It will restart automatically "). I think it stems from the size of file.