Open vanaukenk opened 3 years ago
@vanaukenk I believe the cardinality for create-date
would be 0,1, unless we are building an uplift of all models to date into the process.
To answer your other questions:
Thanks for the feedback @kltm
I'll revise the cardinality on the creation_date field.
For 1, let's confirm with @dustine32 and @ukemi as right now, it looks like the model-level date is the date of the actual import.
For 3, I noticed that the cardinality of date is different in the GoCamModel shape vs the ProvenanceAnnotated shape, but I wasn't sure I understood why date cardinality in ProvenanceAnnotated isn't also 1:
date: xsd:string {1}; date: xsd:string *;
For 4, yes, we'll need to think about how to capture both the reviewer and the date if they just review and approve a model without making any changes.
Thanks for the writeup @vanaukenk and @kltm for answering!
For 1: correct, the current existing date
field is just the date the model's generated by the import code. For creation date, especially since it's a new field, I could add some logic to compute what @kltm proposed, "the earliest date represented" (the min()
of all GPAD col 9 date
+ all Annotation Property creation-date
+ all Annotation Property modification-date
). Or we could just shove the same "date import model generated" into this new creation date. Up to you!
Edit: I should clarify, by "all GPAD col 9 date
+ all ...," I'm including the Protein2GO multi-line annotation situation. So "all" means: across multiple GPAD lines sharing the same annotation id
.
@dustine32 @vanaukenk If not already, the "date" (read modification-date
) model-level property would the max() of all modification dates since it will become that once the model is touched (under current rules).
I'm not actually sure it makes a difference, but I think it would be a little odd if the rules for import models (creation-date
is essentially anything and probably fairly recent) vs. non-import models (creation-date
is the earliest date that somebody talked about this thing) are different. I think that if an import date is important (and not something that can be waved away by considering it history that we'll worry about later), it might be worth considering a separate, optional import-date
model-level annotation.
It is worth adding the addition complexity to have an import date? I would think at the level of the model, the date would be the date the MODEL was modified. This would correspond to the date of import, but once we throw the switch, these models are no longer special. Curators will be working on them just like any other model and should therefore correspond to everything else done in Noctua. I think this is consistent with what you are saying, but just wanted to be sure.
Thanks all.
I think Seth's point is well taken and we probably don't want to decouple 'annotation' dates from 'model' dates and handle dates differently in imported vs non-imported models.
So, if I'm understanding things correctly, to be consistent, for the MOD imports we'd want to make the model-level date the most recent date represented in the set of annotations for a given gene. This would be the same thing that happens now: if I create a new model, the model-level date is the same as all of the 'annotation dates', but if I go back to that model tomorrow and edit, the model-level date now reflects the date of the latest 'annotation'.
This might mean that some of our imported models have dates before Noctua was even a gleam in anyone's eye, but I think that's okay and we're then being consistent about what date means on a model-level.
I'm honestly agnostic about adding an import_date field, but if it's not too costly on the software side, having it there might just make things clearer wrt the chronology curators see in Noctua.
@ukemi Personally, I'm not sure it's worth it or not, but it wouldn't be much extra work if it was. I'm mostly interested in there being a consistent story for what dates mean, but neutral on the addition. I believe we're on the same page here with what "date" (i.e. modification-date
) means at the model level: the last time anything was manipulated in a model.
@vanaukenk Yes, I believe that we have the same picture: the way we're looking at dates means that an awful lot of them will have dates from before Noctua, which is what I think people would expect anyways and would be a requirement for sensible searching for past work. Marginally, I think that there is probably little extra overhead in adding one new timestamp vs two. I'd also note that if we skipped adding an import-date
now, it would be just as easy to add it in consistently the future.
@kltm @dustine32
Here are some possible Dublin Core metadata entries that we could use for the ShEx:
date https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/modified creation_date https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/created
then maybe import_date could just be https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/elements/1.1/date
I think it might be good to get some of @cmungall 's or @balhoff 's experience in modeling and possible consequences here.
We are currently using <http://purl.org/dc/elements/1.1/date>
for the modification date. I do support changing this to dct:modified
. Also I think the values should be xsd:dateTime
instead of xsd:date
.
I also agree with the creation date mapping. For import_date
, I think we should pick something else, because (1) we have previously been using that property for modification date, and (2) generally I think we should use dcterms
instead of dc
(we have other changes to make in this regard). We could use dct:dateSubmitted
or dct:dateAccepted
.
Thanks for the feedback @balhoff
I had also looked at dct:dateSubmitted and dct:dateAccepted
They initially seemed kind of publishing-centric to me, but dct:dateAccepted is probably closest to what we want.
I had actually looked for something like dct:datePublished but couldn't find that.
Unless anyone on this thread objects, I'll put in a PR to update the ShEx for these tag names as well as the xsd value.
@kltm Does that sound okay to you?
From 2021-06-08 MOD imports call:
We want to align how date information is being expressed in the import GPAD files with how dates are modeled in the ShEx.
This will ensure we don't lose any information coming in from the imports and also that we have clear semantics for what the date fields mean in the ShEx and the GPAD files.
In the ShEx, date is currently captured in two places, the GoCamModel shape and the ProvenanceAnnotated shape.
Implicitly, the current use of date means the last date upon which an action was performed on either the model-level or wrt ProvenanceAnnotated which is used in the AnnotatedEdge shape (i.e. to record evidence for an edge).
We propose to add an additional date tag, creation_date, to the GoCamModel and ProvenanceAnnotated shapes to capture the information for this tag that is coming in from the Annotation Property, creation-date, in the GPAD file for the MOD imports.
Cardinality will be 1 for creation_date.
A few questions:
1) For the gene-centric import models, what would a model-level creation date be?
2) There is a comment in the ShEx to change date from xsd:string to xsd:date. Any reason to not also make that change?
3) For the GoCamShape, the current cardinality of date is 1, but the cardinality in the ProvenanceAnnotated shape is *. Is that what we want for ProvenanceAnnotated?
4) In the future, we may have a situation where a curator reviews a model and doesn't make any changes, but we want to capture that they've reviewed and approved the model. Will we want to add another type of date tag to the ShEx for this (e.g. reviewed_date) and will we need to modify the Noctua UI so that there's a specific action taken upon review so we know to capture the date of review?
@kltm - please make sure I've represented the current thinking about dates in the ShEx correctly.
@ukemi @sierra-moxon @lpalbou @tmushayahama @dustine32