USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

Triple Store values? #180

Closed justgo129 closed 8 years ago

justgo129 commented 8 years ago

As discussed in other threads, the issue with an incomplete triple store seems to be resolved. However, when running the following SPARQL query on both data.gc and yasgui, only 28 values are produced.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX bibo: <http://purl.org/ontology/bibo/>

select * FROM <http://data.globalchange.gov> where {
    ?s a gcis:AcademicArticle . 
    ?s bibo:volume ?r 
}
order by ?r

In reality, there should be 2196 values, corresponding to the number of articles. I subsequently ran a truncated query in order to debug:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX bibo: <http://purl.org/ontology/bibo/>

select * FROM <http://data.globalchange.gov> where {
    ?s a gcis:AcademicArticle 
}

This second query returns only 34 values in data.gc and zero in yasgui. Clearly, this is what's causing the issue in the first query. Why is SPARQL returning such a short list of values? See e.g. a sample turtle to see the use of gcis:AcademicArticle.

zednis commented 8 years ago

How are you determining that 2196 values should be produced? Is there a SQL query you can run against the relational database to show what values should be expected by the corresponding SPARQL query?

justgo129 commented 8 years ago

There are 2196 articles in GCIS: http://data.globalchange.gov/article.

rewolfe commented 8 years ago

Justin, It may be a problem with populating the triple-store. Could you locate script that does this? I don't believe it is on github. -Robert

On Mon, Jan 4, 2016 at 9:09 AM, justgo129 notifications@github.com wrote:

There are 2196 articles in GCIS: http://data.globalchange.gov/article.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/180#issuecomment-168686066 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

Here is the result of a describe on a resource, http://data.globalchange.gov/article/10.1002/wcc.133, that should have type gcis:AcademicArticle.

@prefix cito:   <http://purl.org/spar/cito/> .
@prefix ns1:    <http://data.globalchange.gov/report/> .
ns1:nca3    cito:cites  <http://data.globalchange.gov/article/10.1002/wcc.133> .
@prefix ns2:    <http://data.globalchange.gov/report/nca3/chapter/> .
ns2:great-plains    cito:cites  <http://data.globalchange.gov/article/10.1002/wcc.133> .
@prefix biro:   <http://purl.org/spar/biro/> .
<http://data.globalchange.gov/reference/3d715831-4555-4f0c-8f71-a9b93180f124>   biro:references <http://data.globalchange.gov/article/10.1002/wcc.133> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix gcis:   <http://data.globalchange.gov/gcis.owl#> .
<http://data.globalchange.gov/article/10.1002/wcc.133>  rdf:type    gcis:Article .
@prefix ns6:    <http://purl.org/dc/terms/> .
<http://data.globalchange.gov/article/10.1002/wcc.133>  ns6:identifier  "10.1002/wcc.133" .
@prefix prov:   <http://www.w3.org/ns/prov#> .
<http://data.globalchange.gov/article/10.1002/wcc.133>  prov:qualifiedAttribution   _:vb2292137 ,
        _:vb2292282 ,
        _:vb2291721 ,
        _:vb2291301 ,
        _:vb2292141 ,
        _:vb2290270 ,
        _:vb2291630 ,
        _:vb2291631 ,
        _:vb2291898 ,
        _:vb2291859 ,
        _:vb2291652 ,
        _:vb2292015 .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .
@prefix dbpprop:    <http://dbpedia.org/property/> .
<http://data.globalchange.gov/article/10.1002/wcc.133>  dbpprop:pubYear "2011-01-01T00:00:00-07:00"^^xsd:gYear ;
    ns6:title   "Resilience implications of policy responses to climate change"^^xsd:string .
@prefix ns10:   <http://data.globalchange.gov/journal/> .
<http://data.globalchange.gov/article/10.1002/wcc.133>  ns6:isPartOf    ns10:wiley-interdisciplinary .
ns10:wiley-interdisciplinary    ns6:hasPart <http://data.globalchange.gov/article/10.1002/wcc.133> .
@prefix ns11:   <http://data.globalchange.gov/report/nca3/chapter/great-plains/finding/> .
ns11:existing-adaptation-plans-inadequate   cito:cites  <http://data.globalchange.gov/article/10.1002/wcc.133> .

This is what it should be - http://data.globalchange.gov/article/10.1002/wcc.133.thtml.

edit - it has type gcis:Article but is missing type gcis:AcademicArticle. I am not sure why.

justgo129 commented 8 years ago

Thanks, Stephan. Noticing that only articles without contributors were coming up in the triplestore, we subsequently rearranged the resources types of line 82 here so that person and organization are loaded prior to articles. We ran it on dev. It didn't help though. I'm wondering why the presence or absence of contributors seems to make a difference here.

rewolfe commented 8 years ago

Returns 100(+?) articles:

select * FROM <http://data.globalchange.gov> where { ?s a gcis:Article } limit 100

Returns 35 articles:

PREFIX fabio: <http://purl.org/spar/fabio/>
select * FROM <http://data.globalchange.gov> where { ?s a fabio:Article } limit 100

Also returns 35 articles:

select * FROM <http://data.globalchange.gov> where { ?s a gcis:AcademicArticle } limit 100
zednis commented 8 years ago
select count(*) FROM <http://data.globalchange.gov> where { ?s a gcis:Article }

returns 2149

zednis commented 8 years ago

So it seems the articles in the triplestore have type gcis:Article and not gcis:AcademicArticle.

actually, looking at https://github.com/USGCRP/gcis-ontology/blob/master/gcis.ttl, I don't see gcis:AcademicArticle as a class. I am not sure why it is showing up in the THTML, but gcis:AcademicArticle is not in the ontology at present.

rewolfe commented 8 years ago

It looks like this change to the ttl may have gotten dropped back in July. See https://github.com/USGCRP/gcis-ontology/pull/65 and related threads.

On Mon, Jan 4, 2016 at 12:00 PM, Stephan Zednik notifications@github.com wrote:

So it seems the articles in the triplestore have type gcis:Article and not gcis:AcademicArticle.

actually, looking at https://github.com/USGCRP/gcis-ontology/blob/master/gcis.ttl, I don't see gcis:AcademicArticle as a class. I am not sure why it is showing up in the THTML, but gcis:AcademicArticle is not in the ontology at present.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/180#issuecomment-168734434 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 8 years ago

I agree, @rewolfe. e.g. https://github.com/USGCRP/gcis-ontology/commit/48913b60b103b5cbeae695aa3d52fe2c64f6ed9a

@zednis why would fabio:Article only return 35 as well?

zednis commented 8 years ago

Good catch @rewolfe

The ontology change was https://github.com/USGCRP/gcis-ontology/pull/64 but that was never merged. The associated change to the templates was merged in gcis with https://github.com/USGCRP/gcis/pull/191

justgo129 commented 8 years ago

@zednis, @rewolfe figured it out: (1) The "prov" namespace wasn't declared for articles and books (2) For some reason, Virtuoso doesn't like blank strings. Specifically, it doesn't like blank values for prov:actedOnBehalfOf in 'contributors.'

These have been resolved and are currently on dev. See: https://github.com/USGCRP/gcis/pull/261 , https://github.com/USGCRP/gcis/pull/262 , https://github.com/USGCRP/gcis/pull/263 https://github.com/USGCRP/gcis/pull/264

zednis commented 8 years ago

@justgo129 do you have confirmation that fixes resolve the issue where article instances were missing information? Specifically rdf:type statements?

justgo129 commented 8 years ago

yep, those were the issues. We have one ongoing issue with several datasets but assuming the build that I'm running on dev works, I'll deploy to prod and we should be able to close this issue.

zednis commented 8 years ago

Do you know how the prov namespace and/or blank strings were causing the issue? I can see these being issues, but I do not currently understand how they would be causing the specific issue we were looking into.

https://github.com/USGCRP/gcis/blob/master/lib/Tuba/files/templates/article/object.ttl.tut has the following:

%= include 'contributors';

Was the contributors template failing and that caused the article template to fail as well?

justgo129 commented 8 years ago

Kind of. The lack of the prov namespace was causing an issue with the contents of the turtle template. The other issue is really (2). From checking the error logs, it seems that Virtuoso doesn't like blank strings. Why, I don't know.

zednis commented 8 years ago

ok, good work troubleshooting it.

justgo129 commented 8 years ago

thanks. I just ran this on dev and receive a complete triplestore. The number of articles and people is what is expected. We're good to go.

rewolfe commented 8 years ago

@justgo129 - What is final error count? Is it just the dataset that have urls with "blanks" or other strange characters?

On Tue, Jan 12, 2016 at 1:32 PM, justgo129 notifications@github.com wrote:

thanks. I just ran this on dev and receive a complete triplestore. The number of articles and people is what is expected. We're good to go.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/180#issuecomment-171004410 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 8 years ago

13 errors total, all for datasets. 12 have URLs with "blanks," one is an undefined namespace issue which should be a quick code fix and which should be resolved here

rewolfe commented 8 years ago

Cool.

On Tue, Jan 12, 2016 at 1:50 PM, justgo129 notifications@github.com wrote:

13 errors total, all for datasets. 12 have URLs with "blanks," one is an undefined namespace issue which should be a quick code fix.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/180#issuecomment-171011410 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 8 years ago

I have fixed the issue with datasets; no dataset returns an error. I've subsequently tested on dev and stage and all works perfectly well. Closed #180.

rewolfe commented 8 years ago

@justgo129 - Excellent!

Make sure you commit your change to the virtuoso import script to the gcis-rdf repository.

On Tue, Jan 12, 2016 at 8:28 PM, justgo129 notifications@github.com wrote:

Closed #180 https://github.com/USGCRP/gcis-ontology/issues/180.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/180#event-513170894.

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 8 years ago

Sure thing. After testing on dev, I've successfully deployed to stage and prod. I'll run the Virtuoso rebuild on prod after tomorrow's content push.