USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

Most cited (sparql query) #129

Closed justgo129 closed 8 years ago

justgo129 commented 9 years ago

I think this would be a really good example of a SPARQL query and for which to place in a test suite:

Identify the most frequently cited: (a) publication in GCIS (i.e. that exists as a record within GCIS) (b) author listed in GCIS (i.e. that exists as a record within GCIS) (c) publication in the NCA3 (d) author listed in the NCA3

"Author" may include "convening lead author," "lead author," and "author" (i.e. please conflate these into one term for the purpose of the SPARQL query).

This would be a good activity for mining the references for the NCA3.

justgo129 commented 9 years ago

(c) and (d) are more important than (a) and (b)

zednis commented 9 years ago

(d) I found there are multiple qualified attributions between a chapter and an author if the author has multiple organizations. To make the query handle this correctly I wrote it as a sub-query with the assumption that an author should only be listed once per chapter.

SELECT ?author AS ?AuthorID CONCAT(str(?ln), ", ", str(?gn)) as ?Name COUNT(?author) as ?AuthorshipCount
WHERE {
{ SELECT DISTINCT ?author ?gn ?ln ?chapter
    FROM <http://data.globalchange.gov>
    WHERE {
    <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter .
    ?chapter prov:qualifiedAttribution [ prov:hadRole ?role ; prov:agent ?author ] .
    FILTER( ?role = <http://data.globalchange.gov/role_type/convening_lead_author> || ?role = <http://data.globalchange.gov/role_type/lead_author> || ?role = <http://data.globalchange.gov/role_type/contributing_author>)
    ?author foaf:givenName ?gn .
    ?author foaf:lastName ?ln } GROUP BY ?author ?gn ?ln ?chapter }
} order by desc(?AuthorshipCount)
zednis commented 9 years ago

@justgo129 Actually, for (d), is this an author of a work cited by the NCA3 or an author of a chapter of the NCA3? The query I posted above is for a chapter author of the NCA3.

zednis commented 9 years ago

(c) list NCA3 and NCA3 chapter cited articles. @justgo129 should this include any other citations?

http://yasgui.org/short/EkflGZeC

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?article str(?label) as ?title COUNT(?article) as ?count
FROM <http://data.globalchange.gov>
WHERE {
{ <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter . ?chapter cito:cites ?article } UNION { <http://data.globalchange.gov/report/nca3> cito:cites ?article }.
?article dcterms:title ?label
} order by desc(?count)
justgo129 commented 8 years ago

Thanks, @zednis. for (c) @zednis this looks good but chapters in the nca3 can also cite "generics," "books," and "webpages" in addition to "articles" and "reports."

for (d), it should be the most frequently cited author in the nca3, i.e., nca3 or chapters thereof cites many works, each of them having authors. Go through all of the authors of all of the cited works and locate the author which appears the most frequently.

zednis commented 8 years ago

@justgo129 for (c) the cited work can be of any type, I do not do any filtering or selection by type. ?article could be a book, webpage, report, etc.

zednis commented 8 years ago

(d) perhaps we should rephrase? English can be ambigious at times.

The original request was "author listed in the NCA3" then we had "the most frequently cited author in the nca3" but then you suggest to "Go through all of the authors of all of the cited works and locate the author which appears the most frequently."

So is it correct to say you do not want a query for who is the most frequent author of a NCA report component (report or chapter) but instead an author of a work cited by the NCA3 report?

justgo129 commented 8 years ago

Indeed. The author of a work cited by the NCA3 report. Note, an author could be "author," "convening lead author," "lead author," etc. since those categories do exist for some of the works cited by the NCA3. I'm not sure if this matters though since we're no longer storing these roles in the ontology.

justgo129 commented 8 years ago

@rewolfe would you mind taking a look at the output of http://yasgui.org/short/NyGZcG5Wg when you get a chance? Since there are > 3000 citations in the NCA3, the output looks accurate but I could use another pair of eyes.

@zednis regarding (d), I'll present an example since the number of citations is not equivalent to the number of publications and I can locate various technicalities to complicate the wording of the query. Hopefully this should clear up matters:

Persons A, B, and C all were authors of works cited in the NCA3. Of these, locate the author which appears in most publications cited in the NCA3, regardless of the number of citations. i.e. if five chapters cite the same document, the author(s) are only counted once. If five chapters cite the same document differently, the author(s) are only counted once. If five different publications are cited by the NCA3 and all have share (an) author(s), than that person is considered to have authored five (5) publications.

N.B.: Note that the query would be restricted to many, but not all, of the categories at: https://data.globalchange.gov/role_type Specifically the roles having the string "author" or "editor" in their identifiers.

justgo129 commented 8 years ago

@zednis or @rewolfe any update on #129?

justgo129 commented 8 years ago

Closed #129.