epoz / shmarql

SPARQL endpoint explorer
The Unlicense
15 stars 2 forks source link

More RDF parsing #16

Open ch-sander opened 2 months ago

ch-sander commented 2 months ago

pyoxigraph is limited in what it can parse, so maybe rdflib could help to load RDF, e.g., also from JSON-LD

My implementation

def parse_rdf(urls=None, input_format=RDF['input_format'], output_format=RDF['output_format'], output_path=RDF['output_path'],namespaces=RDF['namespaces']):
    output_path = remove_base_path_if_matches(output_path, BASE_PATHS['results'])
    combined_graph = Graph()

    if namespaces:
        try:
            for prefix, uri in namespaces.items():
                ns = Namespace(uri)
                combined_graph.namespace_manager.bind(prefix, ns, override=True)
        except json.JSONDecodeError:
            print("Error parsing the namespaces as JSON")

    for url in urls:
        g = Graph()
        try:
            g.parse(url, format=input_format)
            for s, p, o in g:
                combined_graph.add((s, p, o))
        except Exception as e:
            print(f"Failed to parse RDF from {url}: {e}")
            continue

    if len(combined_graph) > 0:
        try:
            combined_graph.serialize(destination=output_path, format=output_format)
            return {"url": output_path,"len_parse":len(g),"len_serialize":len(combined_graph)}
        except Exception as e:
            return {"error": f"Failed to serialize RDF to {output_path}: {e}"}
    else:
        return {"error": "No data to serialize after parsing."}

takes a list of files as urls (but should also work for local files) and namespaces to bind similar to a JSON-LD context as key-value pairs.