Mayil-AI-Sandbox / kuzudb_jan15

MIT License
0 stars 0 forks source link

RDF Blank Nodes (hashtag2762) #37

Open vikramsubramanian opened 4 months ago

vikramsubramanian commented 4 months ago

When importing blank nodes from Turtle files, we should have the following rules:

  1. When importing from multiple files within the same COPY statement, e.g., COPY UniKG FROM "/path/*.ttl", any blank node with label _:foo will be recognized as the same blank node and get 1 generated iri.
  2. When we support multiple COPY statements , if the same label, so _:foo is used again across different COPY statements, it will be recognized as a different blank node.
  3. The convention for the generated blank node iris is: _:ibj, where i and j are two integers. So "_:ibj" is our prefix for blank node IRIs.
  4. If a user in a CREATE statement uses _:ibj IRI for some node, we do not do something special, i.e., if _:ibj is the IRI of some node, we do our default behavior (either error saying there is a duplicate IRI or merge if a new relationship is being added etc.)
  5. When we support exporting into different RDF formats, then we export blank nodes as blank nodes instead of regular nodes with IRIs. That is if we are exporting into Turtle, then we export with _:ibj. If we export to RDF/XML, then we omit the IRI or use whatever is the blank node specification convention for the exported file's format.
  6. We should not allow a blank node to appear as a predicate (so ignore a triple in the form of "ex:foo _:xyz ex:bar"). Only allow blank nodes to appear in the subjects and objects. We already avoid adding blank nodes as predicates when loading from Turtle files. Let's disable this in CREATE statements as well. )
mayil-ai[bot] commented 4 months ago

Summary: Rules for importing and exporting RDF blank nodes and their behavior in CREATE statements.

Possible Solution

Based on the provided information and code snippets, the following solutions can be applied to address the issues:

Code snippets to check