LinkedDataFragments / Server.Java

A Triple Pattern Fragments server for Java
MIT License
27 stars 20 forks source link

JSON-LD issues with P_ keys #41

Closed RubenVerborgh closed 7 years ago

RubenVerborgh commented 7 years ago

Reported by Lucas Werkmeister on the Wikidata mailing list

I tried playing with it a bit and noticed an oddity in the JSON format: if the predicate and object are both left unspecified, "P_" keys will sometimes refer to full statement nodes and sometimes to truthy values. An example item with not too many statements where you can witness this is Q26536085:

curl -s -H 'Accept: application/ld+json'
'https://query.wikidata.org/bigdata/ldf?subject=http%3A//www.wikidata.org/entity/Q26536085&predicate=&object='
| jq '.["@graph"] | .[] | select(.["@id"]=="wd:Q26536085")'

Right now, I get the following results:

  "P1216": "wds:Q26536085-FCED904F-7F06-444A-84CE-0AFCE089C92C",
  "p:P131": {
    "@id": "wds:Q26536085-44302E43-6F33-4F4A-9783-BD631171BF43"
  },
  "p:P1435": {
    "@id": "wds:Q26536085-01720F88-7A41-47C6-84FA-74F5E7538CDC"
  },
  "P17": "wds:Q26536085-8D04E875-1CD8-4A53-BCC9-1B8591A4AE78",
  "p:P31": {
    "@id": "wds:Q26536085-6CBDDC3D-632A-41C3-8E3B-D9E1D0C103F7"
  },
  "P625": "wds:Q26536085-49F66E15-7AE7-44D6-95CE-A4955734EA07",
  "wdt:P1216": "1243406",
  "P131": "wd:Q635457",
  "P1435": "wd:Q15700834",
  "wdt:P17": {
    "@id": "wd:Q145"
  },
  "P31": "wd:Q3947",
  "wdt:P625": {
    "@type": "geo:wktLiteral",
    "@value": "Point(-2.027844 51.36333)"
  }

As you can see, the "P" key sometimes refers to the full statement node ("wds:") and sometimes to the direct, truthy value ("wd:Q"). Where "P" points to the statement node, there’s also a "wdt:P" entry (sometimes pointing directly at a string containing the ID, sometimes pointing to an { "@id": } object); conversely, where "P" points to the truthy value, there’s a "p:P" entry to an { "@id": _ } object.

Is there any reason why different representations are chosen? Is this predictable? Is this a bug? Or is this just something you have to work around using the @context information if you want to use the JSON format? (The other data formats don’t seem to have this problem, since they don’t use unprefixed "P_" keys.)

smalyshev commented 7 years ago

The confusing part of it is inconsistency - there's only two types of URIs in play here: http://www.wikidata.org/prop/direct/P and http://www.wikidata.org/prop/P - and they are resolved properly with @context clause. However how they are treated in main body looks strange - sometimes it's just P, sometimes it's prefixed P.

RubenVerborgh commented 7 years ago

@smalyshev replied:

After thinking a bit about it I think I've found the logic behind it: when it encounters first P-statement, it uses only suffix and puts the full URL in @context. However when it encounters second URI with the same suffix, it uses the prefix and then there's no need in @context. Since the statements can be encountered in any order in the DB, it's not consistent which of the two gets @context and which gets prefix.

The algorithm for parsing it would be checking if the key has :, if yes, resolve against prefix, if not - resolve against @context.

mielvds commented 7 years ago

This is a Java server issue?

RubenVerborgh commented 7 years ago

Yes.

mielvds commented 7 years ago

Will investigate.

mielvds commented 7 years ago

The JSON-LD is generated by Jena (https://github.com/LinkedDataFragments/Server.Java/blob/master/src/main/java/org/linkeddatafragments/views/RdfWriterImpl.java#L51), so we are depending on whatever serialization technique they use. I suggest filing an issue there. However, if we would encounter such issues in other serializations, it might be a datasource problem (but that doesn't seem the case).