Closed zednis closed 8 years ago
@justgo129 Should I include the author's role? (editor, lead author, author)
That would be great. So technically we'd be asking for "contributors" rather than "authors."
@bduggan @justgo129 This should be doable in a SPARQL query, but the virtuoso instance that appears to be running the endpoint does not seem to fully support group_concat( ).
from the footer on http://data.globalchange.gov/sparql it appears we are running Virtuoso version 06.01.3127 (from 2011?).
Would it be possible to explore upgrading the version of virtuoso we are using as our endpoint? If not, I can run a query that does not do the desired grouping to generate a CSV and then write a script to post-process the CSV.
@zednis would upgrading the version of Virtuoso also provide ability to query classes from subclasses? For instance, a query "a prov:Entity" doesn't generate a list of platforms, instruments etc which are defined as subclasses of prov:Entity within the gcis ontology. In short. I'm wondering whether upgrading would allow us to "kill two birds with one stone."
To support the subclass query you describe we need to utilize RDFS or OWL inference.
We should look at virtuoso to see if there is an option to enable query-time (e.g. backward-chaining) RDFS inference. That may be available on the version of virtuoso you are running or a newer version.
First, let's confirm the version of virtuoso we are using and then we can see where we stand on these two features.
edit - if no versions of virtuoso support RDFS inference (I have not checked yet) we could always use jena or pellet to run the inference during the ingest process before it is imported into virtuoso. Then we would be able to answer the subclass query you mention.
@justgo129 It looks like virtuoso supports rdfs:subClassOf
and rdfs:subPropertyOf
inferences (and a few others). We should be able to enable it with some configuration changes.
On Wednesday, August 5, Stephan Zednik wrote:
Would it be possible to explore upgrading the version of virtuoso we are using as our endpoint?
Yes, but probably not for a while. Another issue with this version (or possibly just the configuration) is that federated queries don't work. I find virtuoso to be very cumbersome to maintain and configure and would be fine moving to another triple store. I have heard good things about blazegraph (for instance, that they are being used by wikidata) so maybe that is an option.
If not, I can run a query that does not do the desired grouping to generate a CSV and then write a script to post-process the CSV.
If you could just help write sparql to get the data I think that's probably enough -- post processing could even just be done in excel.
Brian
ok, I think it would make sense to create a new ticket or email thread around SPARQL endpoint issues so we can keep track of functionality we are having trouble getting to work and discussions of possible solutions (upgrading, change endpoint, etc)
Here is a query that gets basic NCA3 chapter contributor information. It does not group the chapters for each contributor into a single value because I was unable to get SPARQL's GROUP_CONCAT to work correctly with the endpoint. (I was able to get a weird non-standard form of sql:GROUP_CONCAT to somewhat work, but it included duplicates)
Also, I am currently returning role information as well. We may want to consider how role information will affect the original request of mentioning each author/contributor only once.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT
?author as ?ContributorID
str(?gn) as ?GivenName
str(?ln) as ?LastName
?role
str(?cht) as ?ChapterName
FROM <http://data.globalchange.gov>
WHERE {
<http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter .
?chapter dcterms:title ?cht .
?chapter prov:qualifiedAttribution [ prov:hadRole ?role ; prov:agent ?author ] .
?author foaf:givenName ?gn .
?author foaf:lastName ?ln
} group by ?author ?gn ?ln order by ?author
Thanks, @zednis. I pasted the query into the GCIS (https://data-stage.globalchange.gov/sparql) but got no outputs other than column names though. Do you know why?
@justgo129 I do not.
Try it in http://data.globalchange.gov/sparql or with yasgui.org
On Thursday, August 6, justgo129 wrote:
Thanks, @zednis. I pasted the query into the GCIS (https://data-stage.globalchange.gov/sparql) but got no outputs other than column names though. Do you know why?
That non-public endpoint has not been updated for some time.
Brian
What are the requirements for closing this ticket?
The generation of the query will enable the closing of this ticket. The outputs look good, but I see the nonetheless a few author duplicates (e.g. Paul Fleming) even though they are from different chapters. I'd be happy to spin this off to another issue because I nonetheless see the value in the output produced by the code above.
On Tue, Aug 18, 2015 at 1:46 PM, Stephan Zednik notifications@github.com wrote:
What are the requirements for closing this ticket?
— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/113#issuecomment-132293323 .
Justin Goldstein, Ph.D. Advance Science Climate Data and Observing Systems Coordinator US Global Change Research Program 1800 G Street NW, Suite 9100, (Note New Address) Washington, D.C. 20006, U.S.A.
O: (202) 419-3496 M: (202) 285-3005
e-mail: jgoldstein AT usgcrp Dot gov http://www.globalchange.gov
I would like to see a pull request that puts a sparql query like this into the test suite.
@justgo129 I don't think we can provide a report with no author duplicates unless we combine chapters into a single delimited value in cases where authors have contributor relationships with more than 1 chapter.
Alternatively we could provide the output as a JSON or a similar data structure.
Great, @zednis. How about you or @xgmachina prepare the pull request to place this entry into the test suite, after which I'll close the ticket.
@justgo129 I have added this query to the test suite in gcis-sparql. Is this ticket ready to be closed?
Assuming it works and provides the correct output (I haven't had a chance to test), yes.
There are three test suites with SPARQL queries:
Adding to the acceptance tests (1) is great, and these may become examples for end users. These tests may have external dependencies (e.g. dbpedia), so may fail sometimes. Also the results may vary depending on the data.
The other two are run automatically by travis-ci -- adding to them is helpful because these give us regression tests, and guaranteed functionality.
At least some or some version of some of these SPARQL queries should be added to 2 and 3.
[edit] added sentence about data
@bduggan do you think that federated queries should be excluded from (2) and (3) since they have external dependencies?
On Tuesday, September 8, Stephan Zednik wrote:
@bduggan do you think that federated queries should be excluded from (2) and (3) since they have external dependencies?
Yes.
Brian
@zednis I tested the code found at: https://github.com/USGCRP/gcis-sparql/blob/master/ticket-113.sparql at data.globalchange.gov/SPARQL That returned the following error: "Virtuoso 37000 Error SP031: SPARQL compiler: Variable 'Name' is used in the query result set but not assigned"
As the query does seem to work in yasgui, should I ignore that error? http://yasgui.org/short/NJh3AfcZe
@justgo129 interesting. Take the ?Name out of the group by clause and the query should work.
It sure does. The updated query is at: http://yasgui.org/short/EkCG5Gobg. @zednis how would I order the ChapterNumber values to go in order of 1, 2, 3, etc. instead of 1, 10, 11, ...2,21, ...?
Don't convent chapter number to a string.
Worked, and added to: https://github.com/USGCRP/gcis-sparql/ https://github.com/USGCRP/gcis-ontology/tree/justgo129-patch-1/t/results (will merge the latter soon)
@zednis do you agree that this should go into the test suite?
I don't think it needs to be in the gcis-ontology tests; it would be OK as a test in gcis (to test RDF templates) or gcis-sparql (as an example).
The reason I don't think it should be in GCIS-ontology is that the only class or property referenced in the query is gcis:hasChapter
, so it is really a query on how we construct instance data using primarily non-GCIS properties.
If we want a test covering gcis:hasChapter
in GCIS-ontology tests we should go with something much simpler.
Sounds good; I'll just close #113 since this has been added to gcis-sparql. @rewolfe will that disrupt any of your ongoing work?
Closed #113.
"Generate a list of all NCA3 authors, sortable by chapter and organization. Each author can only be mentioned once but nonetheless all chapters authored be stated and captured."
This is the request that Bryce [NCO Colleague] had received a while back.