Open vikramsubramanian opened 8 months ago
Summary: The pid property in the R-R and R-L relationship tables in RDFGraphs should be hidden from users and not be queryable or modifiable.
Based on the provided information and code snippets, the issue seems to be related to the handling of pid
and iri
properties in the context of the Kùzu graph database. The error message indicates that there is a type mismatch where an INT64
type is being used where an INTERNAL_ID
type is expected, and implicit casting is not supported.
To resolve the issue:
pid
property is not exposed to the user. It should be used internally by the system only.table_info
function to hide the pid
property and instead expose the iri
property as a virtual property with a STRING
type.MATCH
queries to use the iri
property instead of pid
for user-facing queries.bind_graph_pattern.cpp
file, where the queryRel
object is being constructed, make sure that the pid
property is not added to the list of properties exposed to the user. Instead, add the iri
property as a virtual property.pid
property is being used in any MATCH
, RETURN
, WHERE
, or SET
clauses in user-facing queries, replace it with the iri
property.pid
directly in user-facing queries results in a binding error, as indicated by the error message.RelExpression
and PropertyExpression
classes to handle the iri
property correctly as a virtual property.By making these changes, you should be able to resolve the type mismatch error and ensure that the pid
property is handled correctly as an internal identifier, while the iri
property is exposed to users as a virtual, user-friendly identifier.
src/binder/bind/bind_graph_pattern.cpp
This snippet contains logic for binding properties to relationship patterns, which is relevant to hiding the pid property and ensuring it triggers a binding error when referenced.
This snippet demonstrates how to drop and rename properties in a table, which could be relevant for removing the pid property from the user's view.
The internalID type pid property of the R-R and R-L relationship tables is a system-level optimization. When users query them, they are unable to query them because of binding errors that are confusing or they get confusing outputs. For example:
Above, 18446744073709551615 looks like a NULL placeholder value. Or:
Instead we should explicitly say that pids should not be queried and modified.
Instead, we have a feature that you can call an
iri
property on these relationships. In reality these relationship tables do not have iri properties but we internally don't trigger binding errors because we would like users to query and obtain string IRIs. So we can do this:Or:
But when a user looks at these tables, they see this:
This behavior is confusing. I think we should just hid pid completely from the user. That is the following should be our behavior:
table_info('UniKG_lt')
function calls. Instead show an iri property with STRING . You can also say "(virtual)" next to string.