cog-imperial / OMLT

Represent trained machine learning models as Pyomo optimization formulations
Other
273 stars 58 forks source link

Support for vector-valued activation functions #6

Closed jalving closed 2 years ago

jalving commented 3 years ago

Currently, our NetworkDefinition assumes the activation function input is always a scalar. This abstraction does not capture activation functions that require the entire output of a layer such as softmax.

One possibility is to add a mapping in NetworkDefinition to optionally map node indices to layers. Something akin to: {node_id -> [nodes in layer]}

Then the build_full_space_formulation could do something like the following where we pass in the (possibly empty) layer nodes to the activation functions:

if not skip_activations:
    activations = net.activations
    block.activation_constraints = pyo.Constraint(block.hidden_output_nodes)
    for i in block.hidden_output_nodes:
        #check whether the node uses its layer nodes in the activation
        if i in net.layer_node_ids:
            layer_zhat = [block.zhat[i] for i in net.layer_node_ids[i]]
        else:
            layer_zhat = ()

        if i not in activations or activations[i] is None or activations[i] == 'linear':
            block.activation_constraints[i] = block.z[i] == block.zhat[i]
        elif type(activations[i]) is str:
            afunc = pyomo_activations[activations[i]]
            block.activation_constraints[i] = block.z[i] == afunc(block.zhat[i],*layer_zhat)
        else:
            # better have given us a function that is valid for pyomo expressions
            block.activation_constraints[i] = block.z[i] == activations[i](block.zhat[i],*layer_zhat)

The utils.py functions would have to be updated to take extra arguments if we go this route.

@fracek: Do you know how ONNX handles softmax? My understanding is that the dict-of-dicts is general enough to accomplish CNN, but here it struggles with softmax/normalization.

fracek commented 3 years ago

In ONNX nodes represent one layer and so the input of all activation functions is the (vector) output of the previous layer.

An alternative approach to what you propose it to add the softmax input nodes id when building the network definition.

jalving commented 2 years ago

This issue is deprecated by PR #24. We now use a layer formulation that is consistent with ONNX. It should be straight-forward to support soft-max.