Proposal: Configuration spec revision and layer sharing

Purpose In complex configurations we find that there are sub-components of that configuration that are shared at one or more levels. Currently, the instance generation process will create unique instances at each "hydration" occurrence (call to from_config_dict). Sometimes this is desirable while in others this is a burdensome cost because:

the instance may acquire a lot of system resources
needs to share state in a costly manner
literally should share state but currently cannot
unnecessary duplication of a shared-read-only resource
etc.

I propose that we add to the configuration schema an optional ID string value annotation that can be used to link equivalent configuration blocks to the same instantiation of that configuration. Additional checks are proposed below to ensure that the appropriate instances are being shared, or not at all, based on the equivalence of the written configurations are each occurrence. This would be a strict addition to the current specification so this would not be a breaking change and be compatible with existing configurations and setups.

I would also argue that we update the configuration structure to separate out into a nested level the mapping of concrete type configurations. This would add the "type_config" property to the structure whose value is a dictionary mapping strings to nested dictionaries. These keys and values would be the same as the dynamic content that is currently added to the top-level structure. This aims to reduce the confusion and potential conflict with fixed keyword parameters as there would now be more than one fixed keyword in the specification. This will be a breaking change from previous configuration files or setups that are based on the current format (just the type_config attribute addition). If this is not acceptable then we may strike this aspect from this proposal.

Example new configuration JSON-like dictionary structure.

{
    id: "<str>"
    type: "<str>"
    type_config: {
        "<name>": {
            ...
        }
    }
}

If no id attribute is provided, as it is optional, then configuration-to-instance hydration, occurs it currently does. If the id attribute is provided, we additionally retain the resulting hydrated instance so that it may be recalled at a later time when a congruent ID (and equal configuration) is encountered. Other instances of configuration with the same ID could then return the same hydrated instance as was generated the first time equivalent id+configuration was encountered. In order for this to successfully happen and have consistent behaviors, subsequent configuration blocks with the same ID must have matching c["type"] and c["type_config"][c["type"]] values (Q: dictionary hash? naive dict value equality?). If this is not the case, an error should be raised that there is a mismatch against an already-registered ID. A configuration block with an id and no type_config value (None-valued or missing a value for c["type_config"][c["type"]]) may be considered valid and return the hydrated instance for the given ID if-and-only-if the instance has already been generated for the given ID (Q: should we even allow this? order of resolution depends on application/algorithm parameter instantiation order. ). If not, an error should be raised similar to when specifying a config with no ID and a missing "type_config", but modulated with a message that an ID was provided and this config was the first encountered but had no concrete configuration.

This will need to maintain a global cache of type instantiations for when the hydration method receives a block with an ID. This cache will also need to retain a copy of the configuration that was used in hydration in order to perform the checks described above. Blocks with no IDs are "anonymous" in that no cache will be maintained of the resulting instance generated from the configuration.

These may be implementation details that subsume the current implementation of from_config_dict/cls_conf_from_config_dict. This function should be updated to also be thread-safe to prevent race-conditions from affecting this caching in a parallelized scenario.

There should be a new function that can be invoked to explicitly clear the cache so that clean-up of instance references may be explicitly controlled.

Kitware / SMQTK-Core

Proposal: Configuration spec revision and layer sharing #23