Closed zazi closed 13 years ago
The tests are working so far. However, I run them only on 5 (+1) random select instances (+1 for proof). So there may other instances where the tests will fail. Maybe the D2R server had some hick-ups ;)
It's strange, I get:
musicbrainz_db=> SELECT COUNT(musicbrainz.url.url)
FROM musicbrainz.l_artist_url
INNER JOIN musicbrainz.link ON l_artist_url.link = link.id
INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id
INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id
WHERE
musicbrainz.link_type.id = 189;
count
42286
(1 row)
Versus
PREFIX is: http://purl.org/ontology/is/core#
PREFIX isi: http://purl.org/ontology/is/inst/
SELECT COUNT(?URI) WHERE { ?URI is:info_service isi:dbtunemyspace . }
SPARQL results: .1 41958
This actually suggests there are (just) too few!
Oh, I didn't count them yet, however, my SPARQL result query contained non-MySpace URIs :\
Did you noticed any bug in the queries?
Will check that... in mean time please note that, as we learned yesterday, I think the condition should read:
d2rq:condition "musicbrainz.link_type.gid='bac47923-ecde-4b59-822e-d08f0cd10156'";
You're right:
PREFIX is: http://purl.org/ontology/is/core#
PREFIX isi: http://purl.org/ontology/is/inst/
SELECT ?URI WHERE { ?URI is:info_service isi:dbtunemyspace . FILTER (!regex(str(?URI), "http://dbtune.org/myspace/"))}
Gives:
http://vids.myspace.com/index.cfm?fuseaction=vids.channel&vanity=jackkapanka [http]
http://www.myspace.de/haendewegjohnny [http]
http://pissfork.net/ [http]
http://myspace.com/acidburpmusic [http]
http://www.soundcloud.com/joeyseary [http]
http://myspace.com/bprestage [http]
http://blogs.myspace.com/index.cfm?fuseaction=blog.view&friendId=290410970&blogId=333854510 [http]
http://myspace.com/castlecruz [http]
http://myspace.com/atlanticconnection [http]
http://blogs.myspace.com/rafevanhoy [http]
http://www.mrfogg.co.uk/ [http]
http://www.shakingsensations.com/ [http]
http://myspace.com/prototypmusic [http]
http://yniwl.com/ [http]
http://www.myspace.cn/bloodywoods [http]
http://myspace.com/kauanmusic [http]
http://www.myspace.cn/yfm [http]
http://www.officialkaya.com/ [http]
http://www.myspace.cn/eltanrenaxy [http]
...
yeah, I'll switch every link type to a GID condition now
But the MB data is not uniform:
musicbrainz_db=> SELECT COUNT(*)
FROM musicbrainz.l_artist_url
INNER JOIN musicbrainz.link ON l_artist_url.link = link.id
INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id
INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id
WHERE
musicbrainz.link_type.id = 189 AND NOT musicbrainz.url.url LIKE 'http://www.myspace.com%';
count
66
E.g.:
musicbrainz_db=> SELECT musicbrainz.url.url
FROM musicbrainz.l_artist_url
INNER JOIN musicbrainz.link ON l_artist_url.link = link.id
INNER JOIN musicbrainz.link_type ON link.link_type = link_type.id
INNER JOIN musicbrainz.url ON l_artist_url.entity1 = url.id
WHERE
musicbrainz.link_type.id = 189 AND NOT musicbrainz.url.url LIKE 'http://www.myspace.com%';
Giving:
http://blog.myspace.com/mrbt
http://myspace.com/dramagods
http://blogs.myspace.com/laurenhoffman
http://groups.myspace.com/viciousrumors
http://www.myspace.de/laudanuminfo
http://www.facebook.com/group.php?gid=6420691667&ref=ts
http://blogs.myspace.com/djyass4ever
http://myspace.com/blastorama
http://www.mrfogg.co.uk/
http://myspace.com/envisagemusicni
...
We could add a (D2RQ) condition "musicbrainz.url.url LIKE 'http://www.myspace.com%'"
(In fact, looking at the last row we should account for a missing 'www')
Solved, since the MB DB already delivers non-"valid" information-service-specific URLs (see issue https://github.com/BarryNorton/D2R-LinkedBrainz-Fork/issues/2).
I tried to define some provenance information with the help of the Info Service Ontology, see the following example for DBTune-MySpace resources (it follows straight forward the link relations mappings):
the following related SQL query should only fetch artist MySpace URLs:
=> this is indeed the case
the following SPARQL query should only fetch artist DBTune-MySpace URIs:
=> however, the result set of this query contains other URIs as well
where is the bug?