JeffersonLab / ccdb

Jefferson Lab Calibration and Conditions Database (CCDB)
8 stars 14 forks source link

Better Control of Timestamps on Ancestor Variations #27

Open markito3 opened 8 years ago

markito3 commented 8 years ago

@DraTeots and I ( @markito3 ) discussed this yesterday.

ccdb_ancestor_time

The issue is that since any given variation may have one or more ancestors (parents, parents of parents, etc.), the user may want to have different calibration times (CALIBTIME or historical timestamp) for the variation being used and each of its ancestors. For example when working on a TOF calibration using the "tofcal" variation, one may want to have a fixed version of all constants not associated with the TOF, i. e., not explicitly named in the "tofcal" variation. If tofcal's parent is "default", then the user would want to use a fixed version of "default", identified by date, but always use the latest version of the "tofcal" variation. Currently the only behavior available is the opposite of this use case; the user would get the latest version of "default" and can only specify a fixed CALIBTIME for "tofcal".

The proposed solution has two parts:

  1. Make another signature-differentiated version of the SetTime function of the API. The current version takes only a time as an argument. The new version takes a time and a variation name.
  2. Add a new parameter to the JANA_CALIB_CONTEXT parameter: VARTIME that specifies variation and time, e. g., VARTIME=mc:2016-04-01 . Multiple instances of VARTIME could appear. Implementation would use the API function defined above.
markito3 commented 8 years ago

I think that the solution proposed above is too complicated. A simpler way that services the majority use case is as follows:

Presently, when a variation inherits assignments from a parent, those assignments can change if the parent variation changes. The inheritance chain needs to be time-stamped. A daughter variation only inherits the assignments of a parent at a particular time.

This would involve changing the schema to store a timestamp with the name of the parent variation when variations are created.

markito3 commented 8 years ago

Whoops, should not have closed this. Reopening...

DraTeots commented 8 years ago

For now I thought the implementation should be like this:

  1. Adding parent time-stamp for variation. So variation now looks like

    Variation
    id
    parent_id
    parent_date
    ...
  2. If parent_date is NULL, then it means, that NOW will be used as a parent_date when CCDB tries to find the right assignment
  3. If one looks for an assignment without time constrain (date=NOW) and there is no such assignment in a given variation, CCDB falls back to a parent variation and search for assignments before the parent_date field.
  4. If one looks for an assignment with a time constrain (date=xxxx-xx-xx) and there is no such assignment in a given variation, CCDB falls back to a parent variation and search for assignments before the parent_date field and the time constrain (it takes the earliest).

There are still things, that are not clear for me:

  1. What should be set as the parent_date by default? For example
ccdb mkvar "new_variation" -p "parent_variation"  
# what parent_date should be considered by default? NULL or NOW? 
  1. What should be set as the parent_date for the existing variations and assignments in it?
  2. Should there be a way to somehow overwrite this "parent_date"
markito3 commented 8 years ago

Those are great questions, @DraTeots. We did not specify those things.

What should be set as the parent_date by default?

First, I think the most common use case is the following:

When a user makes a new variation, she has the following expectations:

  1. The version of the parent variation that is accessed by the new variation should be that as of the time of the creation of the new variation (parentdate = t{creation-of-new}).
  2. The version of the parent variation should remain constant even as the new variation undergoes changes.

The schema you propose supports this case, clearly. The next complication would be if the user wants a version of the parent earlier than t_{creation-of-new}. That case is also trivially supported by the schema and should be allowed by the API.

If you accept the above, the default for parentdate should be t{creation-of-new}. And there should be an option for parent_date to be specified by the user. That much is simple.

That brings up another complication: what if the user-specified parent_date is in the future? Assuming that such a specification is not a mistake, then we can give it the following interpretation: use the most recent version of the parent variation until parent_date and after that use the version as of parent_date. One notable sub-case is if parent_date = +infinity. That means the always wants the most recent version of the parent variation. Note that this interpretation conforms to what the code would do naturally, i. e., it is trivial to implement. The only concern is that such a specification might be a mistake. Not sure whether we should worry about that or not.

Should there be a way to somehow overwrite this "parent_date"?

Should the user have the option of changing the parent_date of a variation at some point after creation? If we allow that then we allow the following complication:

Imagine four variations in three generations: grandma, mama, daughter1, and daughter2 (grandma, is the parent of mama, mama is the parent of daughter1 and daughter2). If daughter1 wants a different version of mama than daughter2 wants there is no problem since parent_date for daughter1 can be set independently of daughter2. The problem comes if daughter1 wants a different version of grandma than daughter2. That cannot be supported unless a mama_of_daughter1 variation is created by the API, using a different parent_date than mama.

That is not too complicated, but I would argue against supporting mama_of_daughter1-API-auto-creation. If daughter1 really needs a new mama, then that variation can be created by hand by the user with a name under the control of the user (my_mama, step_mama, your_mama, etc.). Also, such a use case will be very rare in my opinion; not worth the complication of support.

Even more important, to have a well-defined history for daughters, parent_date cannot change. So I think we are forced into having a system that never changes parent_date.

What should be set as the parent_date for the existing variations and assignments in it?

That is a sticky one. To be consistent with my answer to the previous question, we should (a) set parent_date to the time of creation of the legacy variation. But that would cause a change in the constants for those variations at implementation time. Arrrrgh. To keep consistency with past behavior we could (b) set parent_date = +infinity, but I don't think that that is what people want (inconsistent with the majority use case). Arrrgh again. A compromise would be to (c) set parent_date = implementation-time, thereby freezing the constants to the time we roll out the change. Uglyyyyyy.

Not sure which way to go with this one. Option (c) maybe?

markito3 commented 7 years ago

After talking with Sean at the end of last year (when he was visiting the Lab), I think we came to a proposal for the date-of-the-ancestors problem.

At the time of creation of a new variation, the time-stamp on the direct parent can only be one of two times: (1) the current time or (2) +infinity. And the default should be the current time. +Infinity can be chosen as an option in the API.

This simplifies the API by restricting the choices that a user has. It supports the two most important use cases (1) a fixed version of constants from the parent and (2) always take the latest version of the parent.

And what about the dates on grandparents, and generations previous to them? The proposal is to have the new variation inherit the ancestor timestamps of its direct parent. So whatever timestamp is used generation-by-generation by the parent variation, the same choice is used by the new daughter. The user would have no control over those grandparent-and-higher timestamps; again a simplification.