eBay / akutan

A distributed knowledge graph store
Apache License 2.0
1.65k stars 107 forks source link

Adopt RDF/Sparql's entity format #6

Open superfell opened 5 years ago

superfell commented 5 years ago

Entities are specified by using <entity> or prefix:entity. When a prefix is used, no URI is associated with it, and the prefix itself is used for sorting etc. This should be updated to match Sparql, to allow URIs to be associated with prefixes, and to handle sorting correctly. How this ends up encoded in the eventual KV store key needs some thought as well.

darrengarvey commented 5 years ago

It seems to be standard for prefix:entity to be expanded to a full URI for storage in SPARQL engines. Users can define prefixes however they like in an input .ttl file, in general an engine would have to expand to URIs at insert time.

Upsides to expanding: a. Common implementation approach / the standard assumes (but does not require) QNames are stored as expanded URIs. b. Versioning can be handled by just changing a prefix definition in a query or .ttl file. Not something you see people do very often AFAICT. c. Easier to support N-Quads files and the like where prefixes are already expanded - or .ttl files that don't consistently use prefixes.

Some downsides:

  1. Not being able to get short URIs out easily without post-processing.
  2. Having no enforcement of standard prefixes for a given graph.
  3. Every query and input file needs to specify all prefixes.
  4. More space/time cost for the engine and all downstream tasks.
  5. More complexity for downstream clients - they may need to know prefixes to do any URI matching unless they use an RDF helper library.

In balance I wonder whether it's worth deviating from standard practice here.

One useful feature would be to allow standard prefixes to be defined in the store to simplify use. Side note: Blazegraph supports some [default prefixes],(https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Automatic_prefixes) so you don't need to specify them in queries.