USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

dcterms:publisher and gcis:hasPublisher #103

Closed xgmachina closed 9 years ago

xgmachina commented 9 years ago

Please advise whether it be possible to relate dcterms:publisher with gcis:hasPublisher while retaining the properties of the latter term? Side point: could we revisit the definition for gcis:hasPublisher to incorporate that for gcis:Publisher? so as not to include the term to be defined in its definition?

xgmachina commented 9 years ago

dcterms:publisher - An entity responsible for making the resource available. gcis:hasPublisher - A publication has a publisher that produced it and made it available to the public. @justgo129 Can you specify the question a bit more?

justgo129 commented 9 years ago

I'll rephrase: can we relate dcterms:publisher (or some other term) with gcis:hasPublisher in some way? The sole reason for this question is enhancing search-ability in SPARQL queries wherever we can.

justgo129 commented 9 years ago

@zednis?

zednis commented 9 years ago

Afew questions:

1) with dcterms:publisher is there a need for gcis:hasPublisher? 2) gcis:hasPublisher is an object property but is currently being used in the instance data as a datatype property. Do we want this property to be a datatype property instead? 3) If we don't want to mint resources for publishers would we be willing to use dc:publisher?

example:

<http://data.globalchange.gov/journal/ecology>
   dcterms:identifier "ecology";
   dcterms:title "Ecology"^^xsd:string;
   bibo:eissn "1939-9170";
   bibo:issn "0012-9658";
   gcis:hasPublisher "Ecological Society of America"^^xsd:string;
justgo129 commented 9 years ago

1) not sure, what do you think?

gcis:hasPublisher a owl:ObjectProperty ; rdfs:label "Has Publisher" ; rdfs:comment "A publication has a publisher that produced it and made it available to the public." ; rdfs:domain gcis:Publication ; rdfs:range gcis:Agent ; rdfs:subPropertyOf prov:wasAttributedTo .

2) What are the pros/cons? 3) dc:publisher would be fine - please clarify the difference

In short, I'm fine with removing gcis:haspublisher in favor of dcterms:publisher, or dc:publisher depending on the difference. The only instances of gcis:hasPublisher are:

https://github.com/USGCRP/gcis/blob/ff93d3134bb161be30728afe3a7b3ca5a5d4af9d/lib/Tuba/files/templates/book/object.ttl.tut

https://github.com/USGCRP/gcis/blob/ff93d3134bb161be30728afe3a7b3ca5a5d4af9d/lib/Tuba/files/templates/journal/object.ttl.tut

zednis commented 9 years ago

1) The benefit of the gcis:hasPublisher property over dcterms:publisher is that we would infer the prov:wasAttributedTo relationship (but only if reasoning was applied, which I currently do not believe is the case).

We could update the definition of gcis:hasPublisher to be a sub-property of both prov:wasAttributedTo and dcterms:publisher. If we did that and applied reasoning the following would be available in the triplestore and visible to queries:

pre-reasoning:

<http://data.globalchange.gov/journal/bioscience>
  dcterms:title "BioScience"^^xsd:string;
  gcis:hasPublisher <http://data.globalchange.gov/publisher/american-institute-of-biological-sciences> .

<http://data.globalchange.gov/publisher/american-institute-of-biological-sciences>
  rdfs:label "American Institute of Biological Sciences" .

post-reasoning (assuming prov:contributed defined in gcis.ttl):

<http://data.globalchange.gov/journal/bioscience>
  a prov:Entity, gcis:Publication ;
  dcterms:title "BioScience"^^xsd:string;
  dcterms:publisher <http://data.globalchange.gov/publisher/american-institute-of-biological-sciences> ;
  gcis:hasPublisher <http://data.globalchange.gov/publisher/american-institute-of-biological-sciences> ;
  prov:wasAttributedTo <http://data.globalchange.gov/publisher/american-institute-of-biological-sciences> .

<http://data.globalchange.gov/publisher/american-institute-of-biological-sciences>
  a prov:Agent ;
  rdfs:label "American Institute of Biological Sciences" ;
  prov:contributed <http://data.globalchange.gov/journal/bioscience> .

2) I do not think there are any pros to using an object property as a datatype property. The con is that using a datatype property as an object property prevents us from applying OWL DL inference to the instance data (I think we could still do RDFS inference...)

3) In general dublin core elements properties are all intended to have literal values (i.e.. be datatype properties) and dublin core terms properties are intended to have resource values (i.e. be object properties). There are a few exceptions in dublin core terms (e.g. title, identifier) where properties are still intended to be used with literal values.

justgo129 commented 9 years ago

+1

justgo129 commented 9 years ago

@zednis please feel free to proceed with enacting your recommendation.

zednis commented 9 years ago

To enact my recommendation we will need to start representing publishers as objects rather than as literals.

We would end up with new resources for publishers that would look like the following:

<http://data.globalchange.gov/publisher/american-institute-of-biological-sciences>
  a prov:Agent, gcis:Agent ;
  rdfs:label "American Institute of Biological Sciences" ;
  prov:contributed <http://data.globalchange.gov/journal/bioscience> ;
  gcis:isPublisherOf <http://data.globalchange.gov/journal/bioscience> .

<http://data.globalchange.gov/publisher/elsevier>
  a prov:Agent, gcis:Agent ;
  rdfs:label "Elsevier" ;
  prov:contributed 
    <http://data.globalchange.gov/journal/journal-hydrology> ,
    <http://data.globalchange.gov/journal/marine-chemistry> ,
    <http://data.globalchange.gov/journal/applied-energy> ;
  gcis:isPublisherOf 
    <http://data.globalchange.gov/journal/journal-hydrology> ,
    <http://data.globalchange.gov/journal/marine-chemistry> ,
    <http://data.globalchange.gov/journal/applied-energy> .

@bduggan How easy would it be to update the templates to do this?

justgo129 commented 9 years ago

Hmm.
@zednis do you think this change is necessary? Would it be worth the time expended? I am fine keeping as is if it doesn't justify the effort.

Note also that publishers exist as first class objects in the database, but as an instance of organization. See e.g. https://data.globalchange.gov/organization/cambridge-university-press https://data.globalchange.gov/organization/cambridge-university-press.thtml

fyi, this may make more sense after this week's push.

zednis commented 9 years ago

@justgo129 I do think this change is warranted. gcis:hasPublisher is an object-property and we should correct our usage of it accordingly. If we want to leave the publisher has a literal value we should drop usage of gcis:hasPublisher and go with dc:publisher, but then we lose the nice alignment with PROV.

bduggan commented 9 years ago

This will require an improvement to the relational data model (not hard) and an effort to get the data (hard, since it will take human curation). I don't think we have the resources to undertake this right now, but would love to see it at some point.

zednis commented 9 years ago

We already have text values for the publisher, is the issue that the values have not been curated and normalized?

justgo129 commented 9 years ago

Actually, in addition to that, publisher is also a contributor field.

On Thu, Aug 13, 2015 at 10:12 AM, Stephan Zednik notifications@github.com wrote:

We already have text values for the publisher, is the issue that the values have not been curated and normalized?

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/103#issuecomment-130690614 .


Justin Goldstein, Ph.D. Advance Science Climate Data and Observing Systems Coordinator US Global Change Research Program 1717 Pennsylvania Ave NW, Suite #250 Washington, DC 20006

O: (202) 419-3496 M: (202) 285-3005

e-mail: jgoldstein AT usgcrp Dot gov http://www.globalchange.gov

bduggan commented 9 years ago

On Thursday, August 13, Stephan Zednik wrote:

We already have text values for the publisher, is the issue that the values have not been curated and normalized?

Yes, the current publisher names have been curated for the purpose of display in a document, not for relational integrity.

crossref can help with this, they have a distinct list of 5,046 "publishers and societies", and I think the api returns one of the entries on this list:

http://www.crossref.org/01company/06publishers.html

So a first start would be to improve the article syncer:

https://github.com/USGCRP/gcis-sync/blob/master/lib/Gcis/syncer/article.pm

to use the "publisher" field from crossref.

Also, more research needs to be done about other work in this area, including identifiers for publishers, the relationship between publishers and DOI prefixes (and where to get that), and whether we want to do anything with the APIs offered by some of the publishers.

Brian

zednis commented 9 years ago

ok, we could update to using dc:publisher with the current non-curated text values as an intermediate step and then proceed with the updates @bduggan is suggesting. I really like the idea of making publisher's first-class citizens in the system and having better support for queries such as "show me all articles with attributes X, Y, and Z from publisher A"

bduggan commented 9 years ago

Sounds good, I like that too, just not yet :)

rewolfe commented 9 years ago

Should we move the "not yet" part to another ticket, or should we keep a list someplace else?

On Thu, Aug 13, 2015 at 12:08 PM, Brian Duggan notifications@github.com wrote:

Sounds good, I like that too, just not yet :)

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/103#issuecomment-130743632 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 9 years ago

I think we can move it to another ticket. I am still a little concerned about the suggested approach to resolving the issue at the heart of the ticket but we can chat about that offline.

justgo129 commented 9 years ago

Closed #103, but let's revisit if the time comes for the "not yet" part.