Open Corwinpro opened 5 years ago
These GradientDataDource
classes would be extremely useful for DataSources
where mathematical process are being performed. In which case can either numerically estimate or analytically obtain the gradient of each input slot variable (parameters
) w.r.t. the output slots.
In which case the workflow becomes a network-like construct that we could model in order to produce better estimates of the MCO search direction.
However, difficulties will arise for DataSources
that perform a more higher software related task, such as processing and packaging objects in containers to pass along the workflow. In which case it becomes more appropriate to simply use the MCO parameters and KPIs to construct any model of the workflow.
There is a probably good first thing to do before implementing this: adding @cached
to the DataSource.run
methods, such that (potentially expensive) calculations are performed once.
This issue is related to #184 , #200 .
The optimization interface lacks a rigorous rule of how the
DataSource
runparameters
and the returned gradient should be related.My first suggestions are:
DataSource
:The specifications for this class are below.
run
in case ofGradientDataSource
's) should have the following interface for a scalar-valued objective:Here
Objective
is a scalar value,ParameterVector
andParameterGradient
are of the same length.The type of the
ParameterGradient
is defined by the types of the chosenObjective
and theParameterGradient
.For multiple objectives, the type of
Objective
should be:and, therefore, the
ParameterGradient
changes toParameterVector
are containers: the outputParameterGradient
should have the same.shape
, with the types of the elements defined by the objective and the inputs.Working with KPIs and weighted objectives
I guess a linear combination of objectives is used instead of the "raw" objectives data in some situations. For instance, we might somehow weight the
production_cost
versus theproduction_time
. Then, I propose to implement this as a newGradientDataSource
that defines the weighting procedure.Gradient propagation
Backpropagation. This approach to calculate the gradient of the objective(s) on the very last layer with respect to the primary parameters is robust and does not depend on the structure of the workflow.
Advantages
The interface is consistent from a formal point of view: we always now how the changes in the
parameters
values affect theObjective
value (this is provided by the gradient information)It is possible to apply extra layers of almost zero-cost operations when a combination of raw objectives is required. e.g. to calculate the KPIs
Easy to test gradients consistency with Taylor Tests.
Easy to test physical types consistency (no sums of
lb
withinches
)Related issues and pull requests
https://github.com/force-h2020/force-bdss-plugin-itwm-example/pull/48