geneontology / go-shapes

Schema for Gene Ontology Causal Activity Models defined using RDF Shapes
2 stars 0 forks source link

Proposed changes to capture more information about dates in the ShEx #261

Open vanaukenk opened 3 years ago

vanaukenk commented 3 years ago

From 2021-06-08 MOD imports call:

We want to align how date information is being expressed in the import GPAD files with how dates are modeled in the ShEx.

This will ensure we don't lose any information coming in from the imports and also that we have clear semantics for what the date fields mean in the ShEx and the GPAD files.

In the ShEx, date is currently captured in two places, the GoCamModel shape and the ProvenanceAnnotated shape.

Implicitly, the current use of date means the last date upon which an action was performed on either the model-level or wrt ProvenanceAnnotated which is used in the AnnotatedEdge shape (i.e. to record evidence for an edge).

We propose to add an additional date tag, creation_date, to the GoCamModel and ProvenanceAnnotated shapes to capture the information for this tag that is coming in from the Annotation Property, creation-date, in the GPAD file for the MOD imports.

Cardinality will be 1 for creation_date.

A few questions:

1) For the gene-centric import models, what would a model-level creation date be?

2) There is a comment in the ShEx to change date from xsd:string to xsd:date. Any reason to not also make that change?

3) For the GoCamShape, the current cardinality of date is 1, but the cardinality in the ProvenanceAnnotated shape is *. Is that what we want for ProvenanceAnnotated?

4) In the future, we may have a situation where a curator reviews a model and doesn't make any changes, but we want to capture that they've reviewed and approved the model. Will we want to add another type of date tag to the ShEx for this (e.g. reviewed_date) and will we need to modify the Noctua UI so that there's a specific action taken upon review so we know to capture the date of review?

@kltm - please make sure I've represented the current thinking about dates in the ShEx correctly.

@ukemi @sierra-moxon @lpalbou @tmushayahama @dustine32

kltm commented 3 years ago

@vanaukenk I believe the cardinality for create-date would be 0,1, unless we are building an uplift of all models to date into the process.

To answer your other questions:

  1. I believe it would be the earliest date represented.
  2. We had more power to check for conformance as it would be a standard type. Doing that, however, means we need to coordinate uplifting all models.
  3. I'm not sure I'm catching the issue here?
  4. Yes, that sounds mostly correct. If you want to capture who did the review, you'd need an additional field as well.
vanaukenk commented 3 years ago

Thanks for the feedback @kltm

I'll revise the cardinality on the creation_date field.

For 1, let's confirm with @dustine32 and @ukemi as right now, it looks like the model-level date is the date of the actual import.

For 3, I noticed that the cardinality of date is different in the GoCamModel shape vs the ProvenanceAnnotated shape, but I wasn't sure I understood why date cardinality in ProvenanceAnnotated isn't also 1:

date: xsd:string {1}; date: xsd:string *;

For 4, yes, we'll need to think about how to capture both the reviewer and the date if they just review and approve a model without making any changes.

dustine32 commented 3 years ago

Thanks for the writeup @vanaukenk and @kltm for answering!

For 1: correct, the current existing date field is just the date the model's generated by the import code. For creation date, especially since it's a new field, I could add some logic to compute what @kltm proposed, "the earliest date represented" (the min() of all GPAD col 9 date + all Annotation Property creation-date + all Annotation Property modification-date). Or we could just shove the same "date import model generated" into this new creation date. Up to you!

Edit: I should clarify, by "all GPAD col 9 date + all ...," I'm including the Protein2GO multi-line annotation situation. So "all" means: across multiple GPAD lines sharing the same annotation id.

kltm commented 3 years ago

@dustine32 @vanaukenk If not already, the "date" (read modification-date) model-level property would the max() of all modification dates since it will become that once the model is touched (under current rules). I'm not actually sure it makes a difference, but I think it would be a little odd if the rules for import models (creation-date is essentially anything and probably fairly recent) vs. non-import models (creation-date is the earliest date that somebody talked about this thing) are different. I think that if an import date is important (and not something that can be waved away by considering it history that we'll worry about later), it might be worth considering a separate, optional import-date model-level annotation.

ukemi commented 3 years ago

It is worth adding the addition complexity to have an import date? I would think at the level of the model, the date would be the date the MODEL was modified. This would correspond to the date of import, but once we throw the switch, these models are no longer special. Curators will be working on them just like any other model and should therefore correspond to everything else done in Noctua. I think this is consistent with what you are saying, but just wanted to be sure.

vanaukenk commented 3 years ago

Thanks all.

I think Seth's point is well taken and we probably don't want to decouple 'annotation' dates from 'model' dates and handle dates differently in imported vs non-imported models.

So, if I'm understanding things correctly, to be consistent, for the MOD imports we'd want to make the model-level date the most recent date represented in the set of annotations for a given gene. This would be the same thing that happens now: if I create a new model, the model-level date is the same as all of the 'annotation dates', but if I go back to that model tomorrow and edit, the model-level date now reflects the date of the latest 'annotation'.

This might mean that some of our imported models have dates before Noctua was even a gleam in anyone's eye, but I think that's okay and we're then being consistent about what date means on a model-level.

I'm honestly agnostic about adding an import_date field, but if it's not too costly on the software side, having it there might just make things clearer wrt the chronology curators see in Noctua.

kltm commented 3 years ago

@ukemi Personally, I'm not sure it's worth it or not, but it wouldn't be much extra work if it was. I'm mostly interested in there being a consistent story for what dates mean, but neutral on the addition. I believe we're on the same page here with what "date" (i.e. modification-date) means at the model level: the last time anything was manipulated in a model.

@vanaukenk Yes, I believe that we have the same picture: the way we're looking at dates means that an awful lot of them will have dates from before Noctua, which is what I think people would expect anyways and would be a requirement for sensible searching for past work. Marginally, I think that there is probably little extra overhead in adding one new timestamp vs two. I'd also note that if we skipped adding an import-date now, it would be just as easy to add it in consistently the future.

vanaukenk commented 3 years ago

@kltm @dustine32

Here are some possible Dublin Core metadata entries that we could use for the ShEx:

date https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/modified creation_date https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/created

then maybe import_date could just be https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/elements/1.1/date

kltm commented 3 years ago

I think it might be good to get some of @cmungall 's or @balhoff 's experience in modeling and possible consequences here.

balhoff commented 3 years ago

We are currently using <http://purl.org/dc/elements/1.1/date> for the modification date. I do support changing this to dct:modified. Also I think the values should be xsd:dateTime instead of xsd:date.

I also agree with the creation date mapping. For import_date, I think we should pick something else, because (1) we have previously been using that property for modification date, and (2) generally I think we should use dcterms instead of dc (we have other changes to make in this regard). We could use dct:dateSubmitted or dct:dateAccepted.

vanaukenk commented 3 years ago

Thanks for the feedback @balhoff I had also looked at dct:dateSubmitted and dct:dateAccepted They initially seemed kind of publishing-centric to me, but dct:dateAccepted is probably closest to what we want.
I had actually looked for something like dct:datePublished but couldn't find that. Unless anyone on this thread objects, I'll put in a PR to update the ShEx for these tag names as well as the xsd value. @kltm Does that sound okay to you?

vanaukenk commented 3 years ago

https://github.com/geneontology/go-shapes/pull/262