USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

review provenance relationship between datasets and instruments #124

Closed zednis closed 9 years ago

zednis commented 9 years ago

We currently use prov:wasDerviedFrom to associate datasets with instruments that were used to capture the data. I think prov:wasAttributedTo is a better relationship to use in this case.

definitions:

I would not say a dataset is a transformation of a instrument, an update of an instrument, or constructed based on an instrument. I think it is more logical to ascribe the creation of a dataset to an instrument - which would be an agent in the dataset generation activity.

zednis commented 9 years ago

@xgmachina @bduggan @aulenbac

bduggan commented 9 years ago

Makes sense, +1.

This is just a simple template change.

Brian

zednis commented 9 years ago

Yep, if everyone agrees to the semantics of the change I can create the simple pull request.

One thing that we should verify is that all instruments mentioned in the provenance have type gcis:Instrument and not gcis:InstrumentType. The instrument referenced in the provenance refer to a physical instantiation of an instrument and not a general type of instrument (e.g. gcis:InstrumentType).

justgo129 commented 9 years ago

+1

On Fri, Aug 7, 2015 at 1:49 PM, Stephan Zednik notifications@github.com wrote:

Yep, if everyone agrees to the semantics of the change I can create the simple pull request.

One thing that we should verify is that all instruments mentioned in the provenance have type gcis:Instrument and not gcis:InstrumentType. The instrument referenced in the provenance refer to a physical instantiation of an instrument and not a general type of instrument (e.g. gcis:InstrumentType).

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/124#issuecomment-128776837 .


Justin Goldstein, Ph.D. Advance Science Climate Data and Observing Systems Coordinator US Global Change Research Program 1717 Pennsylvania Ave NW, Suite #250 Washington, DC 20006

O: (202) 419-3496 M: (202) 285-3005

e-mail: jgoldstein AT usgcrp Dot gov http://www.globalchange.gov

bduggan commented 9 years ago

On Friday, August 7, Stephan Zednik wrote:

Yep, if everyone agrees to the semantics of the change I can create the simple pull request.

One thing that we should verify is that all instruments mentioned in the provenance have type gcis:Instrument and not gcis:InstrumentType. The instrument referenced in the provenance refer to a physical instantiation of an instrument and not a general type of instrument (e.g. gcis:InstrumentType).

Not to get too far off topic but wanted to state this since it may be the source of some confusion: the relational model uses "instrument" vs "instrument instance", but the semantic one uses "instrumentType" vs "instrument". So "instrument" means something different in the relational vs semantic models. i.e. the "instrument_instance" template generates turtle for (semantically speaking) "instruments".

In the relational model: an "instrument instance" is a combination of a platform and an instrument. An "instrument" is a class of instruments. These definitions are here:

https://data.globalchange.gov/resources

Brian

rewolfe commented 9 years ago

I don't think that "attributed to" is a good replacement for "derived from". When we say a data set is derived from an instrument, we really mean that the data set is derived from the output of the instrument. I realize that this is a short hand way of saying this, but "output of" is implied because that is what an instrument does, it measures something and produces an output which is captured as a data set. (This output is the "raw" data set or what NASA calls a Level 0 data set.) Other downstream data sets apply algorithms and models (aka. activity) to this instrument output to derive downstream data sets.

So we could do this the simple way and say that a data set is always derived from one or more input data sets using an activity, with the output data set of the instrument implied. Or we could get more complicated and say that the first (raw or level 0) data set is "measured by" an instrument and the subsequent data sets are derived from the first one. I don't see much value added by the latter approach.

On Fri, Aug 7, 2015 at 2:23 PM, Brian Duggan notifications@github.com wrote:

On Friday, August 7, Stephan Zednik wrote:

Yep, if everyone agrees to the semantics of the change I can create the simple pull request.

One thing that we should verify is that all instruments mentioned in the provenance have type gcis:Instrument and not gcis:InstrumentType. The instrument referenced in the provenance refer to a physical instantiation of an instrument and not a general type of instrument (e.g. gcis:InstrumentType).

Not to get too far off topic but wanted to state this since it may be the source of some confusion: the relational model uses "instrument" vs "instrument instance", but the semantic one uses "instrumentType" vs "instrument". So "instrument" means something different in the relational vs semantic models. i.e. the "instrument_instance" template generates turtle for (semantically speaking) "instruments".

In the relational model: an "instrument instance" is a combination of a platform and an instrument. An "instrument" is a class of instruments. These definitions are here:

https://data.globalchange.gov/resources

Brian

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/124#issuecomment-128787566 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

bduggan commented 9 years ago

On Monday, August 10, Robert Wolfe wrote:

I don't think that "attributed to" is a good replacement for "derived from".

I might agree in a colloquial sense, but as Stephan explained, "derivedFrom" has a very specific definition in PROV. If we aren't using it as defined there, we should use something other than prov:wasDerivedFrom.

justgo129 commented 9 years ago

I was unable to locate an adequate term in PROV-O. I'd be fine with creating a predicate here.

zednis commented 9 years ago

I think we still want to use PROV to establish a relationship between the datasets (level 0 or later) and the instrument. I don't think it makes sense to stop at the last mile and NOT use PROV to connect datasets and instruments.

The PROV way to establish a provenance relationship between a level 0 dataset and the sensing instrument would be with the prov:wasAttributedTo property. This was one of the use cases considered by the PROV WG.

I also believe it is ok for the downstream (non-level 0) datasets to be attributed to the sensing instrument.

rewolfe commented 9 years ago

So, I get where you are coming from. I see the example here:

http://www.w3.org/TR/prov-primer/

So, is an instrument a Agent or an Entity?

On Mon, Aug 10, 2015 at 12:35 PM, Stephan Zednik notifications@github.com wrote:

I think we still want to use PROV to establish a relationship between the datasets (level 0 or later) and the instrument. I don't think it makes sense to stop at the last mile and NOT use PROV to connect datasets and instruments.

The PROV way to establish a provenance relationship between a level 0 dataset and the sensing instrument would be with the prov:wasAttributedTo property. This was one of the use cases considered by the PROV WG.

I also believe it is ok for the downstream, non-level 0, datasets to be attributed to the sensing instrument.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/124#issuecomment-129519449 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 9 years ago

An instrument can be both an Agent and an Entity.

justgo129 commented 9 years ago

Would it overly complicate matters if we added turtle for an activity, assigning it a UUID? We can then use: prov:used or prov:wasGeneratedBy through a qualified attribution?

gcis:wasProducedBy a owl:ObjectProperty ; rdfs:label "Was Produced By" ; rdfs:comment "A report was produced by an activity of report producing." ; rdfs:domain gcis:Report ; rdfs:range gcis:ReportGeneration ; rdfs:subPropertyOf prov:wasGeneratedBy .

e.g. a dataset was produced from platform A using activity g, with these activities not being first class objects in the database.

zednis commented 9 years ago

@justgo129 I am not sure what the property gcis:wasProducedBy you suggest adds to the existing prov:wasGeneratedBy property and I think the domain and range you specify do not make sense for such as broadly named relationship as 'was produced by'.

Also I am not sure how the property you have suggested work with the following example - which does not include reports or report generation activities and does include datasets and platforms.

justgo129 commented 9 years ago

Sounds good. To close the loop on this one, @zednis do you have a concrete suggestion using PROV which would resolve the ticket? Would a simple replacement of prov:wasDerivedFrom with prov:wasAttributedTo to associate datasets with "instrument_instances" suffice? Would it satisfy an instrument being both an agent and an entity? @rewolfe are you all right with an instrument being both an agent and an entity?

zednis commented 9 years ago

@justgo129 yes, see USGCRP/gcis/pull/216. If we want to make it explicit I can create pull request where instrument instances also have prov:Agent as an explicitly declared type.

justgo129 commented 9 years ago

@zednis yes, please.

rewolfe commented 9 years ago

@justgo129 - yes, I'm fine with an instrument being both an agent and entity.

On Tue, Aug 18, 2015 at 9:00 PM, justgo129 notifications@github.com wrote:

@zednis https://github.com/zednis yes, please.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/124#issuecomment-132407152 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

justgo129 commented 9 years ago

Great. Closed #124 via https://github.com/USGCRP/gcis/pull/226.