USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

SPARQL query with weird output #132

Closed justgo129 closed 9 years ago

justgo129 commented 9 years ago

Hi everyone, I wrote a SPARQL query today which fails to return results even though I believe it should. What am I doing incorrectly? The query is available at: http://yasgui.org/short/4JyIq8Vh .

The purpose of the query is to count the number of titles of platforms from which datasets, via "instrument instances," were derived. The answer should most definitely exceed 0. I'm pretty sure my syntax for "select distinct count" is correct - see the second answer provided at: http://stackoverflow.com/questions/1223472/sparql-query-and-distinct-count

(I recognize that the penultimate line in the query is unnecessary for the purpose of the query but added it solely for testing reasons).

zednis commented 9 years ago

To troubleshoot a query like this I normally break it down into simpler queries and then when I find that statement that is causing trouble (prov:wasAttributedTo in this case) I will do describe statements on the subject of the problematic statement pattern.

I broke your query down to this

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>

#select distinct ?dataset 
describe ?dataset
#?instrument_instance
#?platform_title 
#(count(?platform_title) AS ?platform_name) 
where {
    ?dataset a gcis:Dataset .
#    ?dataset prov:wasAttributedTo ?instrument_instance .
#    ?instrument_instance a gcis:Instrument .
#    ?instrument_instance gcis:inPlatform ?platform .
#    ?platform dcterms:title ?platform_title 
} LIMIT 5

and from the output of the describe statements noticed that the datasets in the triplestore are associated to the instrument instances via prov:wasDerivedFrom instead of prov:wasAttributedTo.

The REST API shows prov:wasAttributedTo: http://data.globalchange.gov/dataset/nasa-nsidcdaac-0032.thtml

@bduggan any idea why this may be? I know we changed the template from prov:wasDerivedFrom to prov:wasAttributedTo, but the template change was merged to master 17 days ago.

USGCRP/gcis/pull/216

EDIT - after looking at the results of the describe on datasets again I think the prov:wasDerivedFrom statements might be valid. They are dataset -> dataset derivations. There appears to be no dataset -> instrument instance relationships in the triplestore.

zednis commented 9 years ago

I think the triplestore is also missing the newer representation of instrument instances, because this query returns 0 results as well.

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>

select distinct ?dataset 
?instrument_instance
where {
    ?dataset a gcis:Dataset .
  { ?dataset prov:wasAttributedTo ?instrument_instance } UNION { ?dataset prov:wasDerivedFrom ?instrument_instance } .
    ?instrument_instance a gcis:Instrument .
}
bduggan commented 9 years ago

On Monday, August 24, Stephan Zednik wrote:

I think the triplestore is also missing the newer representation of instrument instances, because this query returns 0 results as well.

Looks like the prov namespace prefix is missing from the template, e.g.

http://data.globalchange.gov/platform/advanced-earth-observing-satellite-ii/instrument/seawinds.thtml

These are the only triples generated:

http://data.globalchange.gov/platform/advanced-earth-observing-satellite-ii/instrument/seawinds.nt

Brian

zednis commented 9 years ago

@bduggan The prov namespace was added to this template in commit https://github.com/USGCRP/gcis/commit/88a48da0cb340968e6a51e890ead3f0fdde5c5f6#diff-238eea14cfe940c0fa711f3f061da5d2 7 days ago.

If we haven't run a triplestore load in the last 7 days that could be causing the issues we are seeing with the queries.

bduggan commented 9 years ago

On Tuesday, August 25, Stephan Zednik wrote:

@bduggan The prov namespace was added to this template in commit https://github.com/USGCRP/gcis/commit/88a48da0cb340968e6a51e890ead3f0fdde5c5f6#diff-238eea14cfe940c0fa711f3f061da5d2 7 days ago.

Great, it'll go out in the next release, then.

If we haven't run a triplestore load in the last 7 days that could be causing the issues we are seeing with the queries.

We'll need to do a release before a load: the templates in production do not include this change.

You can see the release on the about page, the X-API-Version header, or via announcements to the api-users list:

http://data.globalchange.gov/about

We are at 1.34:

https://github.com/USGCRP/gcis/tree/1.34
    [bduggan@lubber bduggan]$ curl -v http://data.globalchange.gov | head
    [...]

    < X-API-Version: 1.34

Brian

zednis commented 9 years ago

Thanks @bduggan

@justgo129 We will need to try the query again after the next release and load.

justgo129 commented 9 years ago

sounds good.

On Tue, Aug 25, 2015 at 12:11 PM, Stephan Zednik notifications@github.com wrote:

Thanks @bduggan https://github.com/bduggan

@justgo129 https://github.com/justgo129 We will need to try the query again after the next release and load.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/132#issuecomment-134654630 .


Justin Goldstein, Ph.D. Advance Science Climate Data and Observing Systems Coordinator US Global Change Research Program 1800 G Street NW, Suite 9100, (Note New Address) Washington, D.C. 20006, U.S.A.

O: (202) 419-3496 M: (202) 285-3005

e-mail: jgoldstein AT usgcrp Dot gov http://www.globalchange.gov

justgo129 commented 9 years ago

@zednis @bduggan I just retried the queries given yesterday's release and get the same output as previously for all scripts provided above.

bduggan commented 9 years ago

Works for me:: see query and output in the commit above.

justgo129 commented 9 years ago

Great. The queries work for me now, except I get a list of outputs in-lieu of a count and an additional column is added in-lieu of a rename of a column. http://yasgui.org/short/Ek1A2nL2

zednis commented 9 years ago

@justgo129 could you provide an example of what you mean by "except I get a list of outputs in-lieu of a count and an additional column is added in-lieu of a rename of a column."?

Also, why are you naming the count of ?platform_title ?platform_name?

(count(?platform_title) AS ?platform_name)

this will return a count of the instruments on that platform that the dataset was attributed to.

Perhaps this should be (count(?platform_title) AS ?instruments_on_platform_attributed_to)

For example http://data.globalchange.gov/dataset/nasa-nsidcdaac-0001.thtml was attributed to 4 instruments that are installed on 2 total platforms (2 instruments per platform)

dataset platform_title platform_name
http://data.globalchange.gov/dataset/nasa-ornldaac-16 "National Oceanic and Atmospheric Administration - 10"^^xsd:string "2"^^xsd:integer
http://data.globalchange.gov/dataset/nasa-ornldaac-16 "National Oceanic and Atmospheric Administration - 9"^^xsd:string "2"^^xsd:integer
justgo129 commented 9 years ago

Sure, @zednis. I meant that I wrote the query expecting a count but instead got a list of information. I adjusted accordingly but still get all the entities for which 0 platforms exist. See: http://yasgui.org/short/41i80lP3

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>

select distinct ?dataset (count(?platform_title) AS ?instruments_on_platform_attributed_to) FROM <http://data.globalchange.gov> where {
    ?dataset a gcis:Dataset .
    ?dataset prov:wasAttributedTo ?instrument_instance .
    ?instrument_instance a gcis:Instrument .
    ?instrument_instance gcis:inPlatform ?platform .
    ?platform dcterms:title ?instruments_on_platform_attributed_to 
} 
group by ?dataset ?instruments_on_platform_attributed_to
having (min(?instruments_on_platform_attributed_to) > 0)

note - edited so the query is property formatted. Please use github formatting when pasting queries so they show up correctly.

zednis commented 9 years ago

@justgo129 the query above returns a count of 0 because you never specify ?platform_title in the body of the select. It has no value. You are then overwriting ?instruments_on_platform_attributed_to with the count of an unbound variable.

honestly, I am surprised the endpoint does not throw an error on this query.

After updating the query so that the title of the platform is ?platform_title, the query returns 0 results as would be expected based on the filter at the end.

Here is an updated query that lists the count of instruments on platforms and you will see there are no occurrences of a 0 count.

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>

select distinct ?dataset count(?platform_title) AS ?instruments_on_platform_attributed_to 
FROM <http://data.globalchange.gov> where {
    ?dataset a gcis:Dataset .
    ?dataset prov:wasAttributedTo ?instrument_instance .
    ?instrument_instance a gcis:Instrument .
    ?instrument_instance gcis:inPlatform ?platform .
    ?platform dcterms:title ?platform_title 
} 
order by asc(?instruments_on_platform_attributed_to)
justgo129 commented 9 years ago

@zednis let's chat about this one at your convenience; I'd like to walk through the logic for my own understanding.

justgo129 commented 9 years ago

Thanks for the great one-on-one hangout earlier, @zednis. Expanded query to include datasets has been entered into the test suite https://github.com/USGCRP/gcis-sparql/pull/6. As such, closed #132.

justgo129 commented 9 years ago

Additions have been made to the gcis-ontology repo as well: See: https://github.com/USGCRP/gcis-ontology/pull/151 https://github.com/USGCRP/gcis-ontology/issues/132