Explore adding a voiD term for URI prefixes

GoogleCodeExporter commented 9 years ago

As a simpler alternative to URI patterns. This seems to be a “sweeter spot” 
than the full-blown regex style, and would be very handy for our use of voiD in 
LATC.

Original issue reported on code.google.com by richard....@gmail.com on 21 Oct 2010 at 4:51

Blocking: #85, #82, #89

GoogleCodeExporter commented 9 years ago

Examples of the use of void:uriRegexPattern from Keith:

http://tinyurl.com/yd82v5b

http://tinyurl.com/ybpwuhj

Original comment by richard....@gmail.com on 29 Oct 2010 at 10:14

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Original comment by Michael.Hausenblas on 29 Oct 2010 at 10:15

Added labels: Milestone-Release2.0

GoogleCodeExporter commented 9 years ago

I'm not convinced it is a "sweeter" spot - our void:uriRegexPattern is easy to 
use in a SPARQL query for selecting datasets containing a URI. I am concerned 
that  adding an alternative either in tandem or as a replacement, would 
complicate this.

Original comment by K.J.W.Al...@gmail.com on 29 Oct 2010 at 10:40

GoogleCodeExporter commented 9 years ago

@Keith: Pretending that a regular URI is a regex will actually frequently have 
the desired result.

Original comment by richard....@gmail.com on 29 Oct 2010 at 10:53

GoogleCodeExporter commented 9 years ago

Yeah - but we did do some discussion and work on defining uriRegexPattern 
better already for the next release because of the edge cases where it wouldn't 
have the desired result (eg: http://a.c.com also matches http://abc.com ?)

So I see how just giving a uri prefix  is a little simpler to write, but I 
don't see the scenario in which it is simpler to use.

It would be useful if someone could write out the rationale for introducing the 
new property here.

Original comment by K.J.W.Al...@gmail.com on 29 Oct 2010 at 11:11

GoogleCodeExporter commented 9 years ago

SPARQL has regexes but no substring/contains/startsWith. That's a bizarre 
accident of history. If you are in any other environment, a substring match is 
easier and less error prone than a regex match. In SPARQL, a substring match is 
*also* easier (just use the prefix URI as a regex), but *more* error-prone 
because of the issues we discussed earlier.

Serious SPARQL implementations increasingly tend to come with string functions 
as well:
http://spreadsheets.google.com/pub?key=tl2FDWghDKDc3G70xKkNoNg&output=html

And unlike the REGEX function which invariably performs poorly, a startsWith 
function can actually be optimized by the triple store using an ordered index.

With prefix strings, it is possible to analyze a collection of void:Datasets 
for overlap or containment. This isn't easily possible with regexes.

Original comment by richard....@gmail.com on 29 Oct 2010 at 5:50

GoogleCodeExporter commented 9 years ago

I emailed some voiD users who have used non-trivial regexes in their voiD data.

From Toby Inkster:
> Regex patterns seem like they would remain useful, especially for
> dealing with subsets of a dataset. e.g. saying that the subset matching
> 
>   http://example\.com/(.+)\.ttl
> 
> is available in Turtle format.

From Leigh Dodds:
> I'm tending towards using simple prefixes (and void sub-sets)
> to define a URI space.
> 
> The regex patterns have been useful in writing display code as its
> easy to find whether
> a particular URI matches a space. This is obviously still possible
> with a prefix approach.
> 
> I think everything I've currently done with regex's could be handled with a
> prefix (or set of prefixes).
> 
> Regex's could be useful if you wanted to define, in more detail, what the
> exact structure of a specific URI space might be, e.g. is the prefix followed
> by only letters, or numbers, or whatever.
>
> An additional feature to consider would be use of URI templates to allow
> URI construction. But there you need more than prefix/regex.

In summary, they are not opposed to a void:uriSpace property, but see the 
usefulness of void:uriRegexPattern, or perhaps even of more complex approaches 
that use URI templates.

Original comment by richard....@gmail.com on 8 Dec 2010 at 10:30

GoogleCodeExporter commented 9 years ago

I've gone ahead and added void:uriSpace to Section 4.2, in r169.

Original comment by richard....@gmail.com on 10 Dec 2010 at 2:01

Changed state: Started

GoogleCodeExporter commented 9 years ago

Resolved to close it in today's teleconference. See Issue 91 for followup

Original comment by richard....@gmail.com on 14 Dec 2010 at 11:54

Changed state: Fixed

cygri / void

Explore adding a voiD term for URI prefixes #75