alanruttenberg commented 3 years ago

112 and #116, among others, talk about representation of temporal information. In a comment on #116 I ask @swartik :

It would be helpful to know what sort of queries you would want to do that involve this model.

It occurs to me that this would be a good question for CCO as a whole. Temporal representations are tricky in OWL and their design will necessarily make trade-offs. It would be helpful to articulate what expectations are wrt to such representations. What kinds of queries do users expect to be able to do? What kind of inferences? For instance, part-of-at-all-times is chosen because making transitive part queries is very common.

If the need for concrete times is to say when a process happened or the range of an interval, and then be able to query on time ranges for overlapping processes, there are relatively simple solutions that would address the verbosity seen in #116. Another kind of question might be, if we say a process ends an noon on a certain day, and another starts at noon on the same day, do we need to be able to infer a meets relation?

There are lots of nice-to-haves, but we would be better off if we could articulate what is known to be needed by current, and to some extent anticipated, users, and then figure out what the closest we can get to satisfying those queries using DL queries with a reasoner or SPARQL on a triple store.

alanruttenberg commented 3 years ago

Maybe a survey to collect use cases?

swartik commented 3 years ago

Here's my use case. It covers how I personally need to process dates: that is, how to turn a date (not a date-time) into temporal information. I'd like to know other's experiences.

Conceptually, I use temporal information for three purposes:

For reasoning and inferencing. I write rules that establish chronological order among processes. When I cannot express these rules in DL, I use languages like SPIN.
In SPARQL queries. I'm often interested in the latest knowledge (“Who lives at this address now?”) or knowledge during some temporal region I specify.
For recording provenance. Text in a document that denotes a day, month, or year is typically as close as I can get to knowing the temporal region during which a process occurs.

For my needs, establishing provenance is simplest if I record dates as text. I want to be able to find the sequence of characters that led me to infer the temporal interval of a process. A newspaper article probably has a date in the form “March 31”. That's what I use:

<my-info-bearing-entity> cco:has_date_value "March 31"@en .

If it has “March 31, 2021” I'll use that instead. And if it has “Wednesday” I'll use that as the literal.

This conflicts with my reasoning and querying objectives. With such wide variety and so little standardization, sorting is practically impossible. I could normalize dates ("2021-03-31"^^xsd:string), but that would complicate provenance.

Here is my solution. It is based on personal conventions I've established. They work for me, but if there are simplifications or objections I'd certainly like to hear them.

My assumptions are:

I am modeling a process that occurs during a temporal region.
The temporal region is recorded as a text literal.
The temporal region does not explicitly state the starting and ending moments, but these values can be calculated.

I assert the process to be the subject of an Information Content Entity, which in turn generically depends on an Information Bearing Entity. The Information Bearing Entity is the subject of a triple in the above form, where the predicate is cco:has_date_value and the object is a literal, either with a language tag or of type xsd:string.

I create a temporal interval based on the text literal. I associate the temporal interval with the process and the Information Content Entity. In triple form:

:process occupies-temporal-region :tr .
:tr designated-by :ice .
:ice generically-depends-on-at-all-times :ibe .
:ibe has-date-value "March 31"@en .

This establishes provenance. Next I create temporal instants representing the first and last instants of March 31:

:instant-01 a cco:TemporalInstant .
:instant-02 a cco:TemporalInstant .
:tr has-first-instant :instant-01 .
:tr has-last-instant :instant-02 .

Next, I create two pairs of two anonymous individuals. Each pair contains an Information Content Entity and an Information Bearing entity. The Information Bearing Entity has a date-time value. In one pair, this value is the starting moment of March 31. In the other pair, it's the ending moment. Here's the specification of the starting moment:

_:ice-01 a cco:TemporalInstantIdentifier .
_:ice-02 a cco:InformationBearingEntity ;
         cco:has_datetime_value "2021-03-31T00:00:00-0500"^^xsd:dateTime .
:instant-01 designated-by :ice-01 .

I have established a convention that an Information Bearing Entity individual only provides provenance if it is named. An anonymous individual can be used for reasoning and inference, but it is not to be used for provenance.

Here's a picture of the whole RDF graph. As several of the predicates are from BFO, I've opted to increase readability by using labels for their names.

(No, not yet. My company blocks file uploads to github.com. I'll have to include the image in a separate post from my home computer.)

I can use this structure to write the rules and queries I need, as well as to trace why I asserted the process occurred on March 31.

This, then, is how I represent dates using CCO. I have adhered to (my interpretation of) the principle that literal values are associated with information bearing entities. I know others who find this kind of thing too complex. They define their own data properties whose domain is process and whose range is xsd:dateTime. I have to agree their representations are simpler. But unless and until CCO's creators accept and standardize a new way of representing temporal information, I'd prefer to stick to the established form. I have written scripts to automate entering the information so I am not (overly) concerned by the additional triples.

I should also mention that you can't use Protégé to create this information. Protégé does not allow you to create anonymous individuals (although it can read them). That was another motivation for my scripts.

Comments, questions, concerns?

swartik commented 3 years ago

Here's the graph:

rdf-graph-anons

Alan, I think you asked me what tool I used to create these images. It's yEd.

alanruttenberg commented 3 years ago

Your second and third use case can be accomplished even if the temporal information is kept in an annotation property.

Concerning #1, "I write rules that establish chronological order among processes."

How do you represent the order? In BFO we have a precedes relationship which asserts order. I think you are right about not being able to compute the order in OWL based only on time stamps. You can do In the FOL version using a SMT reasoner so you can have numbers understood. But it's easier to use something external to OWL to compute precedes. Every interval has a first and last instant. You can add a timestamp to those. Precedes is transitive so if you are adding assertions you only need add precedes relations for instants close by, and the rest will be computed. Precedes is a total order on temporal instants, and a partial order on intervals (or temporal regions generally).

There's also the option of using DL-safe rules. But only some reasoners support them.

That said, if you are going external to OWL then OWL doesn't need to reason with them much. It would be adequate to use a data property on processes, as you mention, or on intervals or on instants. This would be a shortcut, with the semantics formally defined in terms of the unstated intervals and IBEs. I agree that a convention would need to be established. I kind of lean in that direction. If OWL can't do much useful reasoning with times, might as well take the more concise expression. The convention would have to be such that you could also use the full blown expression if there's a good reason to do so.

On Protege, I'm guessing that you are right. I don't do a whole lot of authoring in Protege, instead using my own tools. I use something close to the functional syntax but which also allows expressions that look like Manchester syntax. I write macros to be able to concisely express things and the macro expansion can generate multiple assertions which otherwise I would have to write out. Another thing protege can't do is have plain IRIs. Annotations are defined as relationships between IRIs, even IRIs that don't name an instance, class or property. That can be useful if you need to use IRIs in an annotation but don't want to clutter you project with the individuals that Protege insists on.

Thanks for the pointer to yEd!

CommonCoreOntology / CommonCoreOntologies

Competency questions for temporal representation #118

112 and #116, among others, talk about representation of temporal information. In a comment on #116 I ask @swartik :