USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

generate report on NCA report findings cited works #116

Open zednis opened 9 years ago

zednis commented 9 years ago

"Locate all findings based at least partially on articles with a top journal ranking (e.g. Nature and Science) per an official citation metric of one's choice, and calculate the percentage of total references."

justgo129 commented 9 years ago

Also, as an interesting exercise, note that 11 of the NCA3 findings are "report findings" and are thus findings of the entire report, not of particular chapters. https://data.globalchange.gov/report/nca3/finding?page=8

zednis commented 9 years ago

@justgo129 did you mean to close this ticket?

justgo129 commented 9 years ago

Oops, didn't mean to, I hit the wrong button. I'm sorry. Thanks for catching that.

congruili commented 9 years ago

https://data.globalchange.gov/sparql

something is wrong here? I could not get any query results at this moment...

congruili commented 9 years ago
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT
    DISTINCT ?finding, ?journal
FROM <http://data.globalchange.gov>
WHERE {
  <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter .
  <http://data.globalchange.gov/report/nca3> gcis:hasFinding ?finding .
  OPTIONAL{ ?chapter gcis:hasFinding ?finding .}
  ?finding cito:cites ?publication .
  ?publication dcterms:isPartOf ?journal .
  FILTER (regex(?journal, "nature", "i") || regex(?journal, "^http://data.globalchange.gov/journal/science$", "i"))
}
justgo129 commented 9 years ago

Interesting approach. Science and Nature were meant to be examples and not meant to be all inclusive though. Can we generalize this to incorporate everything with a certain impact factor?.

On Tue, Aug 18, 2015 at 4:01 PM, lic10 notifications@github.com wrote:


PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX gcis: http://data.globalchange.gov/gcis.owl#
PREFIX cito: http://purl.org/spar/cito/
PREFIX xsd: http://www.w3.org/2001/XMLSchema#
PREFIX dbpprop: http://dbpedia.org/property/
PREFIX prov: http://www.w3.org/ns/prov#
PREFIX foaf: http://xmlns.com/foaf/0.1/
PREFIX dcterms: http://purl.org/dc/terms/

SELECT
DISTINCT ?finding, ?journal
FROM http://data.globalchange.gov
WHERE {
http://data.globalchange.gov/report/nca3 gcis:hasChapter ?chapter .
http://data.globalchange.gov/report/nca3 gcis:hasFinding ?finding .
OPTIONAL{ ?chapter gcis:hasFinding ?finding .}
?finding cito:cites ?publication .
?publication dcterms:isPartOf ?journal .
FILTER (regex(?journal, "nature", "i") || regex(?journal, "^
http://data.globalchange.gov/journal/science$", "i"))
}

—
Reply to this email directly or view it on GitHub
https://github.com/USGCRP/gcis-ontology/issues/116#issuecomment-132333504
.


Justin Goldstein, Ph.D. Advance Science Climate Data and Observing Systems Coordinator US Global Change Research Program 1800 G Street NW, Suite 9100, (Note New Address) Washington, D.C. 20006, U.S.A.

O: (202) 419-3496 M: (202) 285-3005

e-mail: jgoldstein AT usgcrp Dot gov http://www.globalchange.gov

congruili commented 9 years ago

Some more explanations about the "FILTER" condition: this is a bit tricky here since all the journals contain the "nature" string belong to "Nature" while for "Science" we need to satisfy an entire string match since all the other partial matches like "Biogeosciences" are wrong.

congruili commented 9 years ago

@justgo129 I'm not quite sure what the question is asking.

Could you please explain a bit more regarding the following two sentences?

  1. "calculate the percentage of total references"
  2. "generalize this to incorporate everything with a certain impact factor"
justgo129 commented 9 years ago

Sure.

  1. Of which percentage of the total references are articles from journals with a specific impact factor? (i.e. there are X amount of references in GCIS. Which percentage of these refer to articles which satisfy our condition?)
  2. The article at: https://en.wikipedia.org/wiki/Impact_factor may be of assistance with #2.

Also, don't forget that the Annals of the Association of American Geographers, the New England Journal of Medicine, etc. would have an impact factor equivalent to that of Science, Nature, etc. The query would need to be all inclusive to cover articles from journals of any title that meet a certain impact factor.

On Tue, Aug 18, 2015 at 4:08 PM, lic10 notifications@github.com wrote:

@justgo129 https://github.com/justgo129 I'm not quite sure what the question is asking.

Could you please explain a bit more regarding the following two sentence?

1.

"calculate the percentage of total references" 2.

"generalize this to incorporate everything with a certain impact factor"

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/116#issuecomment-132334984 .


Justin Goldstein, Ph.D. Advance Science Climate Data and Observing Systems Coordinator US Global Change Research Program 1800 G Street NW, Suite 9100, (Note New Address) Washington, D.C. 20006, U.S.A.

O: (202) 419-3496 M: (202) 285-3005

e-mail: jgoldstein AT usgcrp Dot gov http://www.globalchange.gov

congruili commented 9 years ago

@justgo129 The impact factors are currently not included in the database and they would change over time. Could you please just provide a list of what journals you would like to take into account for this question? Otherwise it would be too complicated ...

justgo129 commented 9 years ago

@lic10 I suggest examining http://dbpedia.org/ontology/impactFactor (per Curt Tilmes's comment earlier, actually). Also, this is envisioned as a federated query, involving the mining of information from other databases like Web of Science, etc. Impact factors don't actually change much over time.

congruili commented 9 years ago

@justgo129 I am creating rdf triples for the impact factor of journals using some data I found online. When it's done, we could load the triples in the database and do the sparql query regarding impact factor. Here's another question: should we add this in our GCIS ontology? gcis:Journal gcis:has2014ImpactFactor xsd:decimal

justgo129 commented 9 years ago

Excellent. I don't think we should add the "gcis:has2014ImpactFactor" predicate to our ontology but I can be convinced otherwise.

congruili commented 9 years ago

Here are the rdf triples:

https://drive.google.com/file/d/0B4GwxoO9tVwJZWFZSDhrS2R2dGs/view?usp=sharing

bduggan commented 9 years ago

We shouldn't have to load triples. A federated query can use the triples from dbpedia and query both places dynamically.

congruili commented 9 years ago

The following two queries are both working on the GCIS sparql endpoint:

i. relate GCIS findings to journals and their issn:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT
    DISTINCT ?finding, ?journal, ?issn
WHERE {
  <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter .
  <http://data.globalchange.gov/report/nca3> gcis:hasFinding ?finding .
  OPTIONAL{ ?chapter gcis:hasFinding ?finding .}
  ?finding cito:cites ?publication .
  ?publication dcterms:isPartOf ?journal .
  ?journal bibo:issn ?issn . 
}

ii. find all the journal issn's and their impact factors from dbpedia:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT ?issn, ?impactfactor
FROM <http://dbpedia.org>
WHERE {
  SERVICE <http://dbpedia.org/sparql> {
    ?dbjournal a dbo:AcademicJournal .
    ?dbjournal dbo:issn ?issn .
    ?dbjournal dbo:impactFactor ?impactfactor .  
  } 
}

I have difficulty combing them as a federated query. Still trying. Any suggestions?

congruili commented 9 years ago

After combing the two queries as this:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT
  DISTINCT ?finding, ?journal, ?issn, ?impactfactor
WHERE {
  <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter .
  <http://data.globalchange.gov/report/nca3> gcis:hasFinding ?finding .
  OPTIONAL{ ?chapter gcis:hasFinding ?finding .}
  ?finding cito:cites ?publication .
  ?publication dcterms:isPartOf ?journal .
  ?journal bibo:issn ?issn . 
  SERVICE <http://dbpedia.org/sparql> {
    ?dbjournal a dbo:AcademicJournal .
    ?dbjournal dbo:issn ?issn .
    ?dbjournal dbo:impactFactor ?impactfactor .  
  } 
}

I got the following error: Virtuoso 22023 Error SR012: Function aref needs a string or an array as argument 1, not an arg of type DB_NULL (204)

zednis commented 9 years ago

Do we know if the version of virtuoso we are using supports federated queries?

congruili commented 9 years ago

I do not know the answer. i will try it on the dbpedia sparql endpoint, too.

zednis commented 9 years ago

I was able to successfully run this query on the GCIS endpoint, so it does support the SERVICE keyword

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT
    DISTINCT ?impactfactor
WHERE {
  SERVICE <http://dbpedia.org/sparql> {
    ?dbjournal a dbo:AcademicJournal .
    ?dbjournal dbo:issn ?issn .
    ?dbjournal dbo:impactFactor ?impactfactor .  
  } 
}

I did find this review of federated query support from 2013 which indicates virtuoso 6.1 does have some federated query support, but it does not support federated BINDINGS

https://www.insight-centre.org/sites/default/files/publications/1306.1723v1.pdf

congruili commented 9 years ago

This is also not working on the dbpedia sparql endpoint:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>

SELECT
  DISTINCT ?finding, ?journal, ?issn, ?impactfactor
WHERE {
  SERVICE <https://data.globalchange.gov/sparql> {
    <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter .
    <http://data.globalchange.gov/report/nca3> gcis:hasFinding ?finding .
    OPTIONAL{ ?chapter gcis:hasFinding ?finding .}
    ?finding cito:cites ?publication .
    ?publication dcterms:isPartOf ?journal .
    ?journal bibo:issn ?issn . 
  }
  SERVICE <http://dbpedia.org/sparql> {
    ?dbjournal a dbo:AcademicJournal .
    ?dbjournal dbo:issn ?issn .
    ?dbjournal dbo:impactFactor ?impactfactor .  
  } 
}
zednis commented 9 years ago

This seems to work. I am not sure why what you have been trying does not work...

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX bibo: <http://purl.org/ontology/bibo/>

SELECT DISTINCT ?finding ?journal ?issn1 ?impactfactor
WHERE {
    FILTER(?issn1 = ?issn2) 
    SERVICE <https://data.globalchange.gov/sparql> {
      { <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter . ?chapter gcis:hasFinding ?finding } UNION { <http://data.globalchange.gov/report/nca3> gcis:hasFinding ?finding }
      ?finding cito:cites ?publication .
      ?publication dcterms:isPartOf ?journal .
      ?journal bibo:issn ?issn1 .
    }
    SERVICE <http://dbpedia.org/sparql> {
      ?journal2 a dbo:AcademicJournal .
      ?journal2 dbo:issn ?issn2 .
      ?journal2 dbo:impactFactor ?impactfactor .
    }
} LIMIT 10
zednis commented 9 years ago

hmm, I am now uncertain the query I posted works. The value of ?issn2 shown in the select does not appear to match the value for the issn if you go to the instance URI...

congruili commented 9 years ago

The result is not correct. One single "issn" matches with a bunch of "impactfactor".

zednis commented 9 years ago

I think this query works, but it times out if you have a limit value greater than 7. I will attempt some refactoring to see if I can make it more efficient

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gcis: <http://data.globalchange.gov/gcis.owl#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX bibo: <http://purl.org/ontology/bibo/>

SELECT DISTINCT ?finding ?journal ?journal2 ?issn1 ?issn2 ?impactfactor
WHERE {
  FILTER(str(?issn2) = ?issn1) 
    SERVICE <https://data.globalchange.gov/sparql> {
      { <http://data.globalchange.gov/report/nca3> gcis:hasChapter ?chapter . ?chapter gcis:hasFinding ?finding } UNION { <http://data.globalchange.gov/report/nca3> gcis:hasFinding ?finding }
      ?finding cito:cites ?publication .
      ?publication dcterms:isPartOf ?journal .
      ?journal bibo:issn ?issn1 .
    }
    SERVICE <http://dbpedia.org/sparql> {
      ?journal2 a dbo:AcademicJournal .
      ?journal2 dbo:issn ?issn2 .
      ?journal2 dbo:impactFactor ?impactfactor .
    }
} limit 7
congruili commented 9 years ago

If federated query could not give us the right result, I suggest we load the rdf triples I created (gcis:Journal gcis:has2014ImpactFactor xsd:decimal) and do the query within the gcis endpoint.

bduggan commented 9 years ago

On Monday, August 24, lic10 wrote:

If federated query could not give us the right result, I suggest we load the rdf triples I created (gcis:Journal gcis:has2014ImpactFactor xsd:decimal) and do the query within the gcis endpoint.

I disagree. We do not want to maintain triples generated elsewhere. The triple store is rebuilt on every release and we are not maintaining a mechanism for repopulating using subsets of external datasets.

Brian

justgo129 commented 9 years ago

I agree with @bduggan. @lic10 please inform as to the best way to accomplish this. Thanks a million.

congruili commented 9 years ago

@justgo129 Unfortunately at this moment correct results could not be obtained using federated query.

justgo129 commented 9 years ago

@CurtTilmes could you be of assistance?

CurtTilmes commented 9 years ago

Not much to add...

I agree with Brian, I wouldn't try to pull impact factors into your triple store -- we just want to use the external judgements on journals (e.g. impact factor in this particular case, but it could be any other external factor that someone wants to use to filter journals. Other databases have many facts about journals).

justgo129 commented 9 years ago

Thanks, @CurtTilmes. @lic10 try http://academia.stackexchange.com/questions/3/where-can-i-find-the-impact-factor-for-a-given-journal

justgo129 commented 9 years ago

@lic10 I'm just checking on the status of this ticket.

congruili commented 9 years ago

@justgo129 I could get the impact factors. The problem comes from the federated query we are trying to use.

justgo129 commented 8 years ago

Per 10/13 discussion, let's table this until after we update virtuoso. We'll then retest the federated query and tweak if need be for performance reasons.