USGCRP / gcis-ontology

Ontology for the Global Change Information System
4 stars 7 forks source link

Model, ModelRun, model output, and Dataset #17

Closed zednis closed 8 years ago

zednis commented 9 years ago

There was some question today on the relationship between Model, ModelRun, Dataset, and model output.

From the current GCIS ontology:

gcis:Model a owl:Class ;
    rdfs:label "Model" ;
    rdfs:comment "A simplified description or particular design, especially a mathematical one, of a system or process, to assist calculations and predictions." ;
    rdfs:subClassOf prov:Entity .

gcis:ModelRun a owl:Class ;
    rdfs:label "Model Run" ;
    rdfs:comment "An entity generated by a model." ;
    rdfs:subClassOf prov:Entity .

gcis:Dataset a owl:Class ;
    rdfs:label "Dataset" ;
    rdfs:comment "Any organized collection of data or information that has a common theme. Examples include lists, tables, and databases, etc." ;
    rdfs:subClassOf dctype:Dataset , prov:Entity .

No class or property exists with the name "model output."

Right now gcis:ModelRun is a subclass of prov:Entity and the output from a 'model run' activity. We do not have any classes that specifically represent the activity of running the model. We can use an instance of prov:Activity in the linked data to represent the model run process.

If we were to define a class to represent the model run activity it would make sense to make it a subclass of prov:Activity.

The PROV properties prov:generated and prov:wasGeneratedBy would provide the relationships between the model run activity and the output of the model run (currently type gcis:ModelRun).

In today's meeting we discussed modifying the definition of gcis:ModelRun to be a subclass of gcis:Dataset. I think this make sense and is consistent with the current definition of gcis:Dataset.

We also discussed alternately modifying gcis:ModelRun to be the activity and creating a new class to represent the output of the model run. If we do this I would suggest that new class be a subclass of gcis:Dataset.

congruili commented 9 years ago

I could not locate prov:ModelRun in the "prov" namespace.

http://www.w3.org/TR/prov-o/

Do you mean something else?

zednis commented 9 years ago

Edit: I found and corrected my typo in the previous post. Thanks @lic10

xgmachina commented 9 years ago

How about create a new class gcis:ModelRunOutput as a subclass of prov:Entity and gcis:Dataset. And change gcis:ModelRun as a subclass of prov:Activity?

congruili commented 9 years ago

That could work.

justgo129 commented 9 years ago

Are there any objections to the proposed solution of @xgmachina?

justgo129 commented 8 years ago

Seeing none, @xgmachina please prepare a pull request.

zednis commented 8 years ago

@xgmachina do you want to prepare this pull request, or should I?

xgmachina commented 8 years ago

@zednis @justgo129 Should I close this issue since #130 is done. Changes made are: Create a new class gcis:ModelRunOutput as a subclass of both prov:Entity and gcis:Dataset. And change gcis:ModelRun from a subclass of prov:Entity to a subclass of prov:Activity

justgo129 commented 8 years ago

I'm all right with it if @rewolfe is.

rewolfe commented 8 years ago

+1

On Mon, Aug 24, 2015 at 2:43 PM, justgo129 notifications@github.com wrote:

I'm all right with it if @rewolfe https://github.com/rewolfe is.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/17#issuecomment-134334157 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

xgmachina commented 8 years ago

Thanks all. I will close the issue now.

bduggan commented 8 years ago

I think model run should not be defined as something other than a type of dataset, since this how the phrase is used in the modeling community.

zednis commented 8 years ago

@bduggan is there a term in the community for the activity of running a model? As far as I can tell one issue here is that "model run" is used to specify both the activity and the activity output and the context of the usage is required to determine in which way the term is used.

Is this your counter proposal?

bduggan commented 8 years ago

On Thursday, August 27, Stephan Zednik wrote:

@bduggan is there a term in the community for the activity of running a model?

Good question, "model run generation" or "model run creation" come to mind but I will need to ask around.

Is this your counter proposal?

  • gcis:ModelRun a subclass of gcis:Dataset

Yes, though we should check on the attributes...

  • no subclass of prov:Activity specific to the running of models

I'm not proposing no subclass, just concerned about using "modelRun" for this purpose.

Brian

zednis commented 8 years ago

OK. I am going to reopen this issue.

aulenbac commented 8 years ago

@zednis and @bduggan, granted this is one of those terms where many concepts get rolled up and used as convenient, technical shorthand for a complicated process and usage varies greatly across the communities we serve. That said, I tend to think of a model run as one complete execution of some type of model.

Model type classifications vary of course, but, for this initial discussion let's use a simple scheme like analytical, numerical, observational.

Generalizing, each model run takes zero or more model inputs, completes zero or more calculations, and produces zero or more model outputs. If not null, model run inputs can be things like parameters, input datasets, messages (think processing chains) and so on. If not null, model run outputs can be things like parameters, output datasets, messages (processing chains again) and so on.

This is very general. What are your experiences and thoughts?

zednis commented 8 years ago

@aulenbac agreed that this is a case where many terms are rolled up into one and context is required for correct interpretation.

From your definition it sounds like you view model run in a manner very similar to what had previously agreed on in the ticket - with a model run being an activity.

If the term model run has too much baggage to be happily settled as either the activity or the output, perhaps we avoid using just "model run".

A new proposal:

bduggan commented 8 years ago

Here is a model run in the relational model:

Example:

https://github.com/USGCRP/gcis-sync/blob/master/yaml/model_run/a887f3b4-3d19-44ff-9fa6-b58bbe86dfa5.yaml

Schema:

https://github.com/USGCRP/gcis/blob/master/db/dist/docs/pod/table_model_run.pod

Note that it has a time range which reflects the range of the data not the time of the activty. Also, note that it may be associated with an activity. It is essentially a dataset.

Here is an activity associated with this model run:

http://data.globalchange.gov/activity/4ef1491f-nca3-cmip3-r201205-process

There are a number of model runs associated with this activity.

Actually, the model runs are inputs to this activity.

Here are four runs (note these are datasets and are called "runs") which are inputs to that activity:

https://esg.llnl.gov:8443/metadata/advancedDatasetSearch.do?d_scenario=sresb1&d_frequency=monthly&d_offset=0&d_model=ncar_pcm1

Here are attributes and metadata about each of the runs:

https://esg.llnl.gov:8443/metadata/showObject.do?id=pcmdi.ipcc4.ncar_pcm1.sresb1.run1.monthly
https://esg.llnl.gov:8443/metadata/showObject.do?id=pcmdi.ipcc4.ncar_pcm1.sresb1.run2.monthly
https://esg.llnl.gov:8443/metadata/showObject.do?id=pcmdi.ipcc4.ncar_pcm1.sresb1.run3.monthly
https://esg.llnl.gov:8443/metadata/showObject.do?id=pcmdi.ipcc4.ncar_pcm1.sresb1.run4.monthly

I think it would be difficult to find data about the start/end times of "model run execution", i.e. when these runs were created.

Brian

aulenbac commented 8 years ago

@zednis, good points. Avoiding "model run", would it be clearer to say "model inputs" and "model outputs" instead? I'm trying to address the need to produce and use something that is broadly applicable and understandable. We have hydrological models, economic risk models, invasive species models, epidemiological models, economic growth models, land use models, ..., as well.

zednis commented 8 years ago

@bduggan I think we are all in agreement that what the relational model calls a "model run" corresponds to the class gcis:ModelRunOutput in the current (github master) version of the GCIS ontology. Being that we have not reached consensus within our own group as to whether "model run" is an activity or an entity, I am starting to think we should avoid using the name without additional explicit context as a class in the ontology.

I think it is fine to keep using it in the relational database in the current manner, but hopefully we will find a representation in the ontology that everyone in our group can be satisfied with.

With that said I am curious as to the group's thoughts on this recent suggestion:

justgo129 commented 8 years ago

:+1:

bduggan commented 8 years ago

Recent discussions have given me the impression that modeling the activity of a model run is probably not useful at this point. Running a model is a complex and distributed effort, and not a rabbit hole that is worth going down at this point.

rewolfe commented 8 years ago

I agree. The most important thing to capture is the model run output information. The same information as what we capture for our other (observational) datasets.

On Mon, Aug 31, 2015 at 12:36 PM, Brian Duggan notifications@github.com wrote:

Recent discussions have given me the impression that modeling the activity of a model run is probably not useful at this point. Running a model is a complex and distributed effort, and not a rabbit hole that is worth going down at this point.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/17#issuecomment-136423466 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

OK, are we ok with keeping gcis:ModelRunOutput as the class to represent the model run output with superclasses gcis:Dataset and prov:Entity?

bduggan commented 8 years ago

On Monday, August 31, Stephan Zednik wrote:

OK, are we ok with keeping gcis:ModelRunOutput as the class to represent the model run output with superclasses gcis:Dataset and prov:Entity?

Sure.

Brian

justgo129 commented 8 years ago

:+1:

justgo129 commented 8 years ago

@zednis please feel free to proceed with preparing the pull request.

zednis commented 8 years ago

This is what is currently in the ontology:

gcis:ModelRun a owl:Class ;
    rdfs:label "Model Run" ;
    rdfs:comment "An activity of running a model." ;
    rdfs:subClassOf prov:Activity .

gcis:ModelRunOutput a owl:Class ;
    rdfs:label "Model Run Output" ;
    rdfs:comment "Results generated by running a model." ;
    rdfs:subClassOf prov:Entity, gcis:Dataset . 

Should the pull request be to rename gcis:ModelRun to gcis:ModelRunExecution or to simply remove it?

I will leave gcis:ModelRunOutput as it is.

justgo129 commented 8 years ago

I'll defer to @rewolfe.

rewolfe commented 8 years ago

I vote that we drop ModelRunExecution. We can still use the more general Activity class if we decide to capture information about the specific instance of a model run execution.

On Thu, Sep 3, 2015 at 10:01 AM, justgo129 notifications@github.com wrote:

I'll defer to @rewolfe https://github.com/rewolfe.

— Reply to this email directly or view it on GitHub https://github.com/USGCRP/gcis-ontology/issues/17#issuecomment-137459198 .

Robert Wolfe, NASA GSFC @ USGCRP, o: 202-419-3470, m: 301-257-6966

zednis commented 8 years ago

OK, I will submit a pull request that drops ModelRunExecution.

edit - Actually ModelRunExecution was never added as a class, it was a proposed rename of ModelRun. I think the suggestion to remove the model run activity subclass still holds so I have prepared #145 which removes the current gcis:ModelRun class.

zednis commented 8 years ago

I believe this ticket is ready to be closed.

justgo129 commented 8 years ago

Thanks, @zednis. Closed #17 due to merged #145.