BarryNorton / D2R-LinkedBrainz-Fork

A fork of D2R Server 0.7 for the LinkedBrainz project
http://linkedbrainz.c4dmpresents.org/
GNU General Public License v2.0
7 stars 3 forks source link

Provenance relations #1

Closed zazi closed 13 years ago

zazi commented 13 years ago

I tried to define some provenance information with the help of the Info Service Ontology, see the following example for DBTune-MySpace resources (it follows straight forward the link relations mappings):

# MySpace id=189 for artists
map:ArtistMyspaceIS a d2rq:ClassMap;
    d2rq:dataStorage map:database;
    # d2rq:class  ;
    d2rq:classDefinitionLabel "Artist MySpace Resources"@en;
    # not needed, or?
    # d2rq:join "musicbrainz.l_artist_url.entity0 => musicbrainz.artist.id";
    d2rq:join "musicbrainz.l_artist_url.link => musicbrainz.link.id";
    d2rq:join "musicbrainz.link.link_type => musicbrainz.link_type.id";
    d2rq:join "musicbrainz.l_artist_url.entity1 => musicbrainz.url.id";
    d2rq:condition "musicbrainz.link_type.id=189";  
    d2rq:uriColumn "musicbrainz.url.url";
    d2rq:translateWith map:MySpaceTrans;
    .

# http://dbtune.org/myspace/ (IS resource doesn't exist yet at the isi namespace)
map:artist_myspace_is a d2rq:PropertyBridge;
    d2rq:belongsToClassMap map:ArtistMyspaceIS;
    d2rq:property is:info_service;
    d2rq:constantValue isi:dbtunemyspace;
    .

the following related SQL query should only fetch artist MySpace URLs:

SELECT musicbrainz.url.url AS wiki_url,
    musicbrainz.link_type.id AS link_type
FROM musicbrainz.l_artist_url
    INNER JOIN musicbrainz.link ON l_artist_url.link = link.id
    INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id
    INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id
WHERE
    musicbrainz.link_type.id = 189

=> this is indeed the case

the following SPARQL query should only fetch artist DBTune-MySpace URIs:

PREFIX is: <http://purl.org/ontology/is/core#>
PREFIX isi: <http://purl.org/ontology/is/inst/>

SELECT DISTINCT ?URI WHERE { ?URI is:info_service isi:dbtunemyspace . }

=> however, the result set of this query contains other URIs as well

where is the bug?

zazi commented 13 years ago

The tests are working so far. However, I run them only on 5 (+1) random select instances (+1 for proof). So there may other instances where the tests will fail. Maybe the D2R server had some hick-ups ;)

BarryNorton commented 13 years ago

It's strange, I get:

musicbrainz_db=> SELECT COUNT(musicbrainz.url.url) FROM musicbrainz.l_artist_url INNER JOIN musicbrainz.link ON l_artist_url.link = link.id INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id WHERE musicbrainz.link_type.id = 189;

count 42286 (1 row) Versus

PREFIX is: http://purl.org/ontology/is/core# PREFIX isi: http://purl.org/ontology/is/inst/

SELECT COUNT(?URI) WHERE { ?URI is:info_service isi:dbtunemyspace . }

SPARQL results: .1 41958

This actually suggests there are (just) too few!

zazi commented 13 years ago

Oh, I didn't count them yet, however, my SPARQL result query contained non-MySpace URIs :\

Did you noticed any bug in the queries?

BarryNorton commented 13 years ago

Will check that... in mean time please note that, as we learned yesterday, I think the condition should read:

d2rq:condition "musicbrainz.link_type.gid='bac47923-ecde-4b59-822e-d08f0cd10156'";

BarryNorton commented 13 years ago

You're right:

PREFIX is: http://purl.org/ontology/is/core# PREFIX isi: http://purl.org/ontology/is/inst/

SELECT ?URI WHERE { ?URI is:info_service isi:dbtunemyspace . FILTER (!regex(str(?URI), "http://dbtune.org/myspace/"))}

Gives: http://vids.myspace.com/index.cfm?fuseaction=vids.channel&vanity=jackkapanka [http] http://www.myspace.de/haendewegjohnny [http] http://pissfork.net/ [http] http://myspace.com/acidburpmusic [http] http://www.soundcloud.com/joeyseary [http] http://myspace.com/bprestage [http] http://blogs.myspace.com/index.cfm?fuseaction=blog.view&friendId=290410970&blogId=333854510 [http] http://myspace.com/castlecruz [http] http://myspace.com/atlanticconnection [http] http://blogs.myspace.com/rafevanhoy [http] http://www.mrfogg.co.uk/ [http] http://www.shakingsensations.com/ [http] http://myspace.com/prototypmusic [http] http://yniwl.com/ [http] http://www.myspace.cn/bloodywoods [http] http://myspace.com/kauanmusic [http] http://www.myspace.cn/yfm [http] http://www.officialkaya.com/ [http] http://www.myspace.cn/eltanrenaxy [http] ...

zazi commented 13 years ago

yeah, I'll switch every link type to a GID condition now

BarryNorton commented 13 years ago

But the MB data is not uniform:

musicbrainz_db=> SELECT COUNT(*) FROM musicbrainz.l_artist_url INNER JOIN musicbrainz.link ON l_artist_url.link = link.id INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id WHERE musicbrainz.link_type.id = 189 AND NOT musicbrainz.url.url LIKE 'http://www.myspace.com%'; count 66

E.g.:

musicbrainz_db=> SELECT musicbrainz.url.url FROM musicbrainz.l_artist_url INNER JOIN musicbrainz.link ON l_artist_url.link = link.id INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id WHERE musicbrainz.link_type.id = 189 AND NOT musicbrainz.url.url LIKE 'http://www.myspace.com%';

Giving:

http://blog.myspace.com/mrbt http://myspace.com/dramagods http://blogs.myspace.com/laurenhoffman http://groups.myspace.com/viciousrumors http://www.myspace.de/laudanuminfo http://www.facebook.com/group.php?gid=6420691667&ref=ts http://blogs.myspace.com/djyass4ever http://myspace.com/blastorama http://www.mrfogg.co.uk/ http://myspace.com/envisagemusicni ...

BarryNorton commented 13 years ago

We could add a (D2RQ) condition "musicbrainz.url.url LIKE 'http://www.myspace.com%'"

(In fact, looking at the last row we should account for a missing 'www')

zazi commented 13 years ago

Solved, since the MB DB already delivers non-"valid" information-service-specific URLs (see issue https://github.com/BarryNorton/D2R-LinkedBrainz-Fork/issues/2).