delving / culture-hub

The Delving Search and Administrative Interface
Apache License 2.0
66 stars 5 forks source link

Missing state model for DataSet #602

Open manuelbernhardt opened 12 years ago

manuelbernhardt commented 12 years ago

There's a DataSet state model inherited from the days when that state was entirely managed by the Sip-Creator, and evolved a little bit to accomodate for changes in the processing model.

The current states are:

The event model now aims at reflecting every change of state and other event related to a DataSet. The existing events are:

There is however a number of events or change of state (depending on what semantics the state model has) that are not captured:

Some of the above events, although discrete, could perhaps be viewed as one, as e.g. mappings, invalid records (and hints?) are usually connected on an abstract level. These events have impact on other - not yet existing - states, for example a new mapping means that the set is "outdated" in some way, as the new mapping probably influences the way the data and index looks like.

We should think of a better state model for the DataSet, with various states related to the different parts of the life-cycle:

It might also make sense to consider splitting the publishing life-cycle so that it is possible to index without re-creating the cache. For this however the state model needs to reflect whether the cache is outdated (in comparison to the mapping).

I think a lot of the above would get clearer if we could somehow bundle the sip-creator "meta-information" (mapping, invalid records, hints, statistics) and keep versions thereof. We dropped the versioning of source data for the time being as it does not effectively bring any added value and technically isn't viable at the moment. A lot of the "versions" created contained identical data and were only versioned because of problems in identifying the same records.

geralddejong commented 12 years ago

I think that the state machine is the core concept orchestrating the internal and external aspects of the dataset workflow, so it is of paramount importance that we devise and refine this state machine in the context of a unit test which covers all possible situations.

I'm thinking more in terms of a state composed of a number of bits rather than an enumeration of all individual plausible states. Many of the state transitions could pay attention to only one or two of these bits, which avoids the quadratic explosion of state transitions. In other words, many of these bits cast shadows on the others.

I come up with 9 bits:

geralddejong commented 12 years ago

It should be possible to upload data before any proper mapping has been performed so that one person can upload the data and hand it over to somebody else for building a mapping.

manuelbernhardt commented 12 years ago

Actually this is possible to do. If you then try processing the set with missing mappings, they will be ignored during the run. Of course it'd be better if the interface wouldn't let you process at all.

On Thu, Aug 2, 2012 at 3:59 PM, Gerald de Jong reply@reply.github.com wrote:

It should be possible to upload data before any proper mapping has been performed so that one person can upload the data and hand it over to somebody else for building a mapping.


Reply to this email directly or view it on GitHub: https://github.com/delving/culture-hub/issues/602#issuecomment-7456002