Open vikramsubramanian opened 7 months ago
Summary: Issue with deleting Resource nodes causing inconsistent behavior and suggesting options to address the problem.
Based on the provided information and code snippets, the issue at hand is related to the deletion of Resource nodes in the Kùzu graph database, which can lead to dangling predicates and inconsistent query behavior. The proposed solutions are either to disable deletion of Resource nodes or to delete all referring triples as well.
Given the preference for disabling deletion due to implementation and performance concerns, the solution would be to update the deleteResource
function to prevent the deletion of Resource nodes. Additionally, the documentation should be updated to reflect that Resource nodes cannot be deleted.
Here is the solution:
deleteResource
function to raise an exception or return an error indicating that deletion of Resource nodes is not supported.Here is the updated deleteResource
function:
// In the relevant C++ file where the deleteResource function is defined
// Disable deletion of Resource nodes
Status deleteResource(ResourceID resource_id) {
// Return an error status indicating that deletion is not allowed
return Status::Error("Deletion of Resource nodes is not supported.");
}
Make sure to replace ResourceID
with the actual type used for resource identifiers in the codebase, and Status
with the appropriate status or error reporting type used in the project.
src/binder/bind/ddl/bind_create_rdf_graph.cpp
This snippet is relevant because it contains functions that handle the naming of RDF-related tables, which could be involved in the process of disabling deletion or handling the deletion of RDF resources.
src/processor/operator/persistent/reader/rdf/rdf_reader.cpp
This snippet is relevant as it contains the logic for handling RDF triples, which is directly related to the issue of dangling predicates when a resource is deleted.
src/binder/bind/bind_graph_pattern.cpp
This snippet is relevant because it deals with binding RDF predicates in graph patterns, which could be affected by the deletion of RDF resources and the handling of dangling edges.
If I delete a Resource r that is being used as a predicate, it leaves the predicate "dangling". That results in inconsistent behavior. For example, if you issue a query that accesses for the iris of predicates that refer to r, then those triples are excluded from the results. If you modify the same query to access the pid instead, those triples start emerging.
We should either: Option 1. Disable DELETING of Resource nodes. So Resource node table is append-only. Option 2: We delete all of the triples that refer to r as well.
I suggest we do Option 1 for now. Then we can switch to Option 2 if we think we can find it, which looks difficult to implement and support. Option 2 also seems it would be very slow for simple deletions because it seems to require that you scan all of the rt and lt triples to find those that refer to the deleted resources and delete them. So if you delete 1 resource from the database, you end up scanning the entire database.
Tentatively, I'm writing the documentation saying Resource nodes cannot be deleted. )