fluree / db

Fluree database library
https://fluree.github.io/db/
Other
330 stars 21 forks source link

allow use of id-maps in :values pattern #809

Open dpetran opened 2 weeks ago

dpetran commented 2 weeks ago

The json-ld standard does not actually support iri expansion in a value map. Also, iris are denoted with id maps in every other bit of FQL syntax. This commit allows both json-ld-compliant iri declaration and makes our syntax more consistent.

Here's an example of our current value-map syntax not expanding iris

I believe we should deprecate the {"@value" <iri> "@type" "xsd:anyURI"} syntax for iri values and not document this usage for public use.

bplatz commented 2 weeks ago

I love this feature and struggled with this myself, super glad you tackled it.

The one thing I think we should support slightly differently than you put it in is that the values variable should work like a pure substiution.

e.g. this make tons of sense:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?friend"]  [{"@id" "ex:brian"}]]]

as you can think of it resolving to this:

"where" {"@id": "?s", "ex:friend": {"@id" "ex:brian"}}

But this doesn't make sense to me:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  [{"@id" "ex:brian"}]]]

As you'd imagine it would resolve to this, which isn't how you'd query:

"where" {"@id": {"@id" "ex:brian"}, "ex:friend": "?friend"}

Instead, I think if you were trying to do that same query you'd want to define it like this and sub in the variable:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  ["ex:brian"]]]

Likewise in the original example above, if you used @id for ex:friend you'd think this would logically work:

"where" {"@id": "?s", "ex:friend": {"@id": "?friend"}}
"values" ["values" [["?friend"]  ["ex:brian"]]]

Is this possible without making it substantially more complex a problem?

dpetran commented 2 weeks ago

Is this possible without making it substantially more complex a problem?

Unfortunately it wouldn't be possible to achieve this without the addition of some complex analysis of where the variable is used. And I think it may even be inconsistent, because there is nothing stopping you from using the same :value bound variable in all three of the subject, predicate, object positions.

However, I think you can frame it another way where the semantic isn't "literal substitution" and more "expansion, then substitution".

If we imagine the query/txn as a json-ld document (it's not, but we try to pretend it is as far as we can), then we can think of the id-maps as annotation for the f:values key, something like this:

{"@context": {"ex": "http://example.com/", "?": "http:flur.ee/var#"},
 "f:values": [{"?:var": [{"@id": "ex:bar"}, "not-an-iri", {"@id": "ex:foo"}]}]}

Which expands to:

[
  {
    "f:values": [
      {
        "http:flur.ee/var#var": [
          {
            "@id": "http://example.com/bar"
          },
          {
            "@value": "not-an-iri"
          },
          {
            "@id": "http://example.com/foo"
          }
        ]
      }
    ]
  }
]

Now, this isn't how we actually do expansion, but users don't need to know that. They can correctly mentally model the values of the :values key as things that will be expanded before substitution, and then the whole id-map vs iri distinction goes away.

bplatz commented 2 weeks ago

Ok, either way this is helpful over what we had. We can think about addressing the next step if it becomes an issue for our users.

I want to confirm one thing, even though these are different queries, I assume they will both behave as the same query?

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?friend"]  [{"@id" "ex:brian"}]]]
"where" {"@id": "?s", "ex:friend": {"@id": "?friend"}}
"values" ["values" [["?friend"]  [{"@id" "ex:brian"}]]]
dpetran commented 2 weeks ago

I want to confirm one thing, even though these are different queries, I assume they will both behave as the same query?

Yes, and I've added a test to ensure that semantic persists.

zonotope commented 2 weeks ago

FQL is not JSON-LD. There are a lot of things that we do in FQL that don't agree with the JSON-LD spec. The whole notion of binding variables to particular values is foreign to JSON-LD, so this scenario would never arise and I'm not surprised that JSON-LD doesn't address it.

Also, iris are denoted with id maps in every other bit of FQL syntax

Do you have any examples of this? Every usage of iris in FQL that I can think of use id maps to represent the node. For example, in a where clause

{"@id": "ex:foo"}

represents the node with "@id" "ex:foo", just like

{"ex:bar": "ex:baz"}

represents a node with the value of the "ex:bar" property being "ex:baz". Also

{"ex:bestFriend": {"@id": "ex:charlie}}

represent a node with the value of the "ex:bestFriend" property being the node with "@id" "ex:charlie".

Binding an id map to a variable and then using that variable as the value of an id later is inconsistent in my mind. You are saying that a node's id is the node itself, not the iri that represents that node.

dpetran commented 2 weeks ago

In practice I think our users will mainly be depending on the heuristic that identifiers that need expansion need to be wrapped in an id-map. I know I tried to use id-maps at first and when they didn't work I was confused.

And FWIW, we do pretend in our official documentation that FQL is JSON-LD. And that's not strictly wrong, we just utilized @type @json to allow our own syntax within it, which is a distinction that I doubt users will understand until they've grokked JSON-LD.

I still think this is useful, do you think we shouldn't allow id-maps in :values?

zonotope commented 2 weeks ago

The semantic is not "wrap this in an id map if you want to expand it"; the semantic is "an id map represents a node, everything else is scalar data". We automatically expand any value of the "@id" attribute because we know that must unambiguously be an iri, just like we also expand "ex:foo" in {"@id": "?s", "ex:foo": "bar"} because we know a property identifier must also be an iri. That's it.

FQL is a pattern matching system and we substitute the value bound to a variable directly in the places that variable appears. This patch breaks that, but only sometimes, in certain specific situations. That inconsistency will lead to much more confusion. I don't think we should allow id maps in values because of all of that inconsistency. @bplatz listed some of these inconsistencies involving weirdly recursive "@id" values:

But this doesn't make sense to me:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  [{"@id" "ex:brian"}]]]

As you'd imagine it would resolve to this, which isn't how you'd query:

"where" {"@id": {"@id" "ex:brian"}, "ex:friend": "?friend"}

I agree that this doesn't make sense. This is my point. Instead, I think if you were trying to do that same query you'd want to define it like this and sub in the variable:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  ["ex:brian"]]]

This is the most consistent way to express this query given the rest of the syntax of FQL, and it's is what we currently do without this patch.

The only missing piece is that if we rely on automatic inference, the parser will think that "ex:brian" is a string, so we now need some way to tell the parser that "ex:brian" is supposed to be an iri and not a string.

In every other scenario where automatic inference fails and the user has to specify how to interpret scalar data, we use an "@value" map to provide that extra information, so the most consistent thing to do here is to use an "@value" map as well. "@id" maps represent nodes; "@value" maps represent scalar data along with extra information for how to interpret that data.

zonotope commented 2 weeks ago

I also want to make clear that I'm not wedded to using value maps for this per se if folks find that construct confusing in this situation. I think value maps are fine, but I could get on board with a new syntax (depending on what the syntax is, of course).

For example, maybe we could try to do something with the (iri "ex:foo") function. I don't know how easy or hard it would be to add that not having looked into it since I haven't been in that part of the code in a while, and I would also make sure that we could do it in consistent way that's in line with our other planned usage of that function, but it's a thought.

I just think it would be a bad idea to reuse an existing syntax that already means something else.

bplatz commented 2 weeks ago

"@id" maps represent nodes

I understand your point but don't fully agree that there is a distinction between an IRI and a 'node'. I think an IRI is always a node. It may not have properties assigned to it in the local db (yet), but you must assume it has properties in different db somewhere.

The fact that we don't require specifying the IRI data type in a few circumstances is probably the main culprit that makes this confusing. Those circumstances are: 1) when used as the value of @id 2) when used as the value of @type 3) when used as a property

The JSON-LD spec allows you to specify the datatype of an IRI like this:

{"@value" "ex:brian",
 "@type": "@id"}

I think this should be the base case of what we support. Here @id is a shortcut for xsd:anyURI (I presume, although I'm not sure that is ever made explicit in the spec). This is similar to how @type is a shortcut for rdf:type.

I consider this a further shortcut for the above, which I am supportive of using but there is arguably some debate about that:

{"@id": "ex:brian"}

Then the question becomes if we can do anything to handle the 3 circumstances above such that the behavior of using :values is identical to the behavior without it.

I understand the complexity of this, so I'd recommend punting that decision as we have some larger issues that don't have work arounds currently which deserve our focus.

I do think we should at minimum support the "@type": "@id" as that is what the spec uses - so I'll suggest we update this PR to limit it to that.

zonotope commented 2 weeks ago

I understand your point but don't fully agree that there is a distinction between an IRI and a 'node'. I think an IRI is always a node. It may not have properties assigned to it in the local db (yet), but you must assume it has properties in different db somewhere.

This has the potential to get real philosophical real fast, but I just want to assert that there is a concrete distinction between IRI and subject node. We need to keep that distinction straight in order to build a consistent system.

It's the same as the distinction between "Benjamin Lamothe" and myself. I am not the sequence of characters "Benjamin Lamothe", but people use that character sequence as a symbol to refer to me in certain contexts.

The JSON-LD Spec describes a node identifier as an IRI used to refer to a subject node. It isn't the node itself. The node itself is an entity that has a set of characteristics, or properties. In order to talk about that specific entity in the context of an RDF graph, and to differentiate it from other entities in the graph, we give it a specific name. That name is an IRI. We indicate that a specific IRI is the identifier of a subject by using the "@id" key of the map we use to describe that subject.

I have a height, weight, age, and favorite food, but my name does not. My name is a thing that people use to refer to me.

The goal of this exercise is to allow users to include subject node identifiers in values clauses, and we later substitute what they provide us explicitly as a subject node identifier inside of a subject node map.

Everywhere in FQL, we use {"@id": "ex:foo"} to represent the subject node whose identifier is "ex:foo". That is consistent with the usage in the JSON-LD spec. There is never a situation in FQL when we use an IRI string alone to represent a subject node.

For example, when a subject node is an object object of an RDF triple, we don't use the raw iri string, we use a map as in {"@id": "ex:john", "ex:bestFriend": {"@id": "ex:steve"}}. This is also consistent with the JSON-LD spec.

This proposal here is to change that semantic, and instead use a subject node map to serve as a node identifier, but only in a certain specific situation. That would already introduce an internal inconsistency, but it would also have weird consequences that would require that we introduce cascading inconsistencies to resolve.

You mentioned this about the JSON-LD spec:

The JSON-LD spec allows you to specify the datatype of an IRI like this:

{"@value" "ex:brian",
"@type": "@id"}

This is not the case. Note the paragraph describing the value of the "@type" key within an "@value" object:

The value associated with the @type key MUST be a term, an IRI, a compact IRI, a string which can be turned into an IRI using the vocabulary mapping, @json, or null.

The spec excludes the "@id" keyword from that list. The spec does allow you to define "@type": "@id" for an alias within a context, but it's only to tell the processor to expand the string iri value associated with that alias into a subject node map whose "@id" is that string value. Doing that here would lead to the same problems involved with trying to use a subject node as a subject node identifier.

You went on to say

I think this should be the base case of what we support. Here @id is a shortcut for xsd:anyURI (I presume, although I'm not sure that is ever made explicit in the spec). This is similar to how @type is a shortcut for rdf:type.

This is what we support today, without this patch. The only caveat is that it uses xsd:anyURI instead of @id because of that explicit omission in the spec for "@value" maps.

I consider this a further shortcut for the above, which I am supportive of using but there is arguably some debate about that:

{"@id":` "ex:brian"}

As I've mentioned, introducing this as a shortcut for {"@value": "ex:brian", "@type": "xsd:anyURI"} conflicts with the rest of FQL in the usage of maps with an "@id" key. It might make putting iris in values clauses easier (though that's arguable), but it will lead to much more confusion in the long run as a result of all of the inconsistencies it introduces with how the rest of FQL treats that syntax.

I thought about this a lot when I developed this syntax for specifying IRIs in values. The first option I considered was using {"@id": "ex:foo"} but it soon became clear to me why this was a bad choice.

What we have now using an "@value" map with "xsd:anyURI" as the data type is the most clear and consistent alternative that I could think of but, like I said, I'm not wedded to it and am open to the possibility that there is a better syntax I haven't considered.

I just don't think introducing a new, contradicting meaning to a syntax we already use is the right call.

dpetran commented 5 days ago

That would already introduce an internal inconsistency, but it would also have weird consequences that would require that we introduce cascading inconsistencies to resolve.

Do you have an example in mind of this? I was able to get this working with ~2 code lines changed and I don't think it affects anything downstream - we'd just have to document the syntax.

We've long since departed from the path of True JSON-LD Semantics, so I don't believe that anybody is looking at the spec and complaining about inconsistencies between value node and node identifier representations in our syntax. It's also not clear to me what practical consequences would result from this inconsistency.

The downsides of using the value-map syntax is you need to have the xsd:anyURI prefix in your context, which requires the user to go find that xsd IRI somewhere and paste it, which makes using IRI values much more inconvenient. Also, if we're modeling the value as a scalar, {"@type" "xsd:anyURI" "@value" "ex:some-iri"} doesn't actually work as a scalar value in Fluree. It blows up because the value never gets expanded and encoded as a SID.

I do feel that id-maps are a lot easier to explain in this vein:

If you're in a position where you need to distinguish between a string and an IRI, wrap the IRI in an id-map.

I don't see any case in our syntax where this heuristic would cause a problem, plus it's easier to type and easier to remember