The DataSources gradients implementation and gradient propagation

Corwinpro commented 5 years ago

This issue is related to #184 , #200 .

The optimization interface lacks a rigorous rule of how the DataSource run parameters and the returned gradient should be related.

My first suggestions are:

Implement a generic class inheriting from DataSource:

class GradientDataSource(DataSource):
    pass

The specifications for this class are below.

A solving method (run in case of GradientDataSource's) should have the following interface for a scalar-valued objective:

Immutable = Tuple[ModelDataValue]
Objective = DataValue(type="OBJECTIVE_TYPE")
ParameterVector = List[DataValue(type="PARAMETER_TYPE")]
ParameterGradient = List[
    DataValue(
        type="OBJECTIVE_TYPE / PARAMETER_TYPE"
    )
]

class GradientDataSource:
    def run(
        self, 
        model: Immutable, 
        parameters: ParameterVector
    ) -> Tuple[Objective, ParameterGradient]:

Here Objective is a scalar value, ParameterVector and ParameterGradient are of the same length.

The type of the ParameterGradient is defined by the types of the chosen Objective and the ParameterGradient.
For multiple objectives, the type of Objective should be:

Objective = List[DataValue(type="OBJECTIVE_TYPE")]

and, therefore, the ParameterGradient changes to

ParameterGradient = List[
    List[
        DataValue(
            type="OBJECTIVE_TYPE / PARAMETER_TYPE"
        )
    ]
]

A similar extension can be done for the case, when the input parameters ParameterVector are containers: the output ParameterGradient should have the same .shape, with the types of the elements defined by the objective and the inputs.

Working with KPIs and weighted objectives

I guess a linear combination of objectives is used instead of the "raw" objectives data in some situations. For instance, we might somehow weight the production_cost versus the production_time. Then, I propose to implement this as a new GradientDataSource that defines the weighting procedure.

Gradient propagation

Backpropagation. This approach to calculate the gradient of the objective(s) on the very last layer with respect to the primary parameters is robust and does not depend on the structure of the workflow.

Advantages

The interface is consistent from a formal point of view: we always now how the changes in the parameters values affect the Objective value (this is provided by the gradient information)
It is possible to apply extra layers of almost zero-cost operations when a combination of raw objectives is required. e.g. to calculate the KPIs
Easy to test gradients consistency with Taylor Tests.
Easy to test physical types consistency (no sums of lb with inches)

Related issues and pull requests

https://github.com/force-h2020/force-bdss-plugin-itwm-example/pull/48

flongford commented 5 years ago

These GradientDataDource classes would be extremely useful for DataSources where mathematical process are being performed. In which case can either numerically estimate or analytically obtain the gradient of each input slot variable (parameters) w.r.t. the output slots.

In which case the workflow becomes a network-like construct that we could model in order to produce better estimates of the MCO search direction.

However, difficulties will arise for DataSources that perform a more higher software related task, such as processing and packaging objects in containers to pass along the workflow. In which case it becomes more appropriate to simply use the MCO parameters and KPIs to construct any model of the workflow.

Corwinpro commented 5 years ago

There is a probably good first thing to do before implementing this: adding @cached to the DataSource.run methods, such that (potentially expensive) calculations are performed once.

force-h2020 / force-bdss