Closed nleguillarme closed 2 years ago
I think you understand it correctly. I think this is related to issue #892 . The rdflib uses the blank identifiers as they are.
Changing this behavior now would break some things and as we are in the feature freeze for 5.x I moved it to the 6.0.0 milestone.
Actually I have a use case where I need to parse multiple files within the same context of blank identifiers. When executing SPARQL queries I need to have individual contexts per query. Maybe it would be a good idea to introduce some blank context object which can be handed over to the parse method and the query method. We have to put this on the roadmap for 6.0.0.
Thank you for your reply. However I don't really understand... does that mean that there is no graph merging mechanism currently implemented in rdflib ? This would be in contradiction with what is said in the doc :
In RDFLib, blank nodes are given unique IDs when parsing, so graph merging can be done by simply reading several files into the same graph
but both the graph have common subject and predicate and object is different.
we solve this issue as follows, We take the new map through which we were assigning new ids to each new blank nodes of different graphs. If two blank nodes came from the same graph then we assign the same id. you can download the updated code from the URL #1101
@vikash18086 thank you for contributing to the RDFlib. I think this would not actually solve the issue. As I have mentioned earlier:
Actually I have a use case where I need to parse multiple files within the same context of blank identifiers. When executing SPARQL queries I need to have individual contexts per query. Maybe it would be a good idea to introduce some blank context object which can be handed over to the parse method and the query method. We have to put this on the roadmap for 6.0.0.
So we need some way to:
Cool thank you @mwatts15 for #1107 this is the interface as I have proposed it. I like it. We have to make sure that it also works across different serialization formats. I think it should not be a problem with Turtle, for RDF/XML the value of rdf:nodeID
the same as the bnodeLabel following _:
in Turtle and NTriples and JSON-LD is also using the _:
syntax.
Also We need a similar solution for #892.
I'm currently not able to test #1107 and #1108. But As I see for #1108 the test do not yet reflect using the same context for different serialization formats. Also we need it for the other formats as well.
@white-gecko I'm only really interested in the N-Triples and N-Quads formats.
As far as other parsers, you already get distinct blank nodes between different documents for some. I don't know if sharing them across documents makes as much sense for other formats. Turtle/N3 has more complicated handling of blank nodes: formulas define their own nested blank node contexts. What's the use-case for something like the bnode_context
idea? The RDF/XML parser gives you distinct IDs for each parse unless you use preserve_node_ids
- it just means "use the node ID as the BNode identifier". TriX also has preserve_node_ids
although the TriX parser still creates BNodes like BNode(label)
even when it's not "preserving" identifiers -- seems pretty useless.
JSON-LD looks like it would be more annoying in general, but also for this. I have less than zero interest in that.
That is fine. I'm actually also just interested in this feature for NTriples. But for the sake of consistency of the parsing interface I think it would be good to have the blank node/blank id support handled in the same way for all parsers. Maybe there will be somebody who needs it at some time … ;-)
Looks like #1108 fixes this issue (“Address remainder #980. Also add similar behavior for N-Quads.”) and so it can be closed?
Closing this Issue since PR #1495 includes a test that shows that this particular Issue is solved (due to PR #1108). Thanlks @gjhiggins!
Hi.
If I understand correctly the graphs merging process explained here, the following piece of code should create a graph with two distinct blank nodes :
However, when executing the code, I get the following output :
(rdflib.term.BNode('Ne3fd8261b37741fca22d502483d88964'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/RO_0002350'), rdflib.term.URIRef('http://www.gbif.org/species/0000002')) (rdflib.term.BNode('Ne3fd8261b37741fca22d502483d88964'), rdflib.term.URIRef('http://purl.obolibrary.org/obo/RO_0002350'), rdflib.term.URIRef('http://www.gbif.org/species/0000001'))
Am I missing something ? (versions : rdflib 4.2.2, python 3.7.5)