Contrary loss update direction of CS-FMU and changeless loss of ME-FMU with FMUParameterRegistrator

awecefil commented 3 months ago

Currently, I am trying to use FMUParameterRegistrator to do parameter calibration/estimation but encounter some issues.

I do experiments on both ME-type & CS-type FMU：

For ME-type FMU

Model：SpringPendulum1D.mo (from FMIZoo.jl) Exporting Tool：Dymola (I dircetly use the FMU from FMIZoo.jl) Result：good, parameter can be tuned correctly after training

Model：SpringPendulum1D.mo (from FMIZoo.jl) Exporting Tool：OpenModelca v1.23.0 (64-bit) Result：the loss doesn't change during training so the parameter is not tuned correctly, below is the loss function I use and part of info during training process

function lossSum(p)
global neuralFMU, x₀, posData, params, w, counter

w = p[1].value
if FMU_SOURCE == "OpenModelica"
    params = Dict(zip(initStates, [w, 0.0, 1.0, 0.5, 0.0, 10.0, 1.0]))  # for OM
elseif FMU_SOURCE == "Dymola"
    params = Dict(zip(initStates, [0.5, 0.0, w, 10.0, 1.0, 1.0, 0.0]))  # for Dymola
end

solution = neuralFMU(x₀; parameters=params, p=p, showProgress=true, saveat=tSave)
#solution = neuralFMU(x₀; p=p, showProgress=true, saveat=tSave)

if !solution.success
    return Inf 
end

posNet = fmi2GetSolutionState(solution, 1; isIndex=true)
velNet = fmi2GetSolutionState(solution, 2; isIndex=true)

loss_value = FMIFlux.Losses.mse(posData, posNet)
if counter % 100 == 0 || counter == 1
    @info "LossSum[$counter] - Loss: $(round(loss_value.value, digits=5)), p: $(p[1].value)"

end

return loss_value
end

Parameters Before: Params([[0.1]])
t=4.0000e+00s | Δt=1.3271e-01s | STPs=7 | EVTs=0 | 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:01        
[ Info: LossSum[0] - Loss: 1.59984, p: 0.1
[ Info: LossSum[1] - Loss: 1.59984, p: 0.09900000000013325
[ Info: LossSum[100] - Loss: 1.59984, p: 0.005808103062422814
[ Info: LossSum[200] - Loss: 1.59984, p: -0.0730673163153592
[ Info: LossSum[300] - Loss: 1.59984, p: -0.15119005659167525
[ Info: LossSum[400] - Loss: 1.59984, p: -0.20746426846536525
[ Info: LossSum[500] - Loss: 1.59984, p: -0.2559694377722006
[ Info: LossSum[600] - Loss: 1.59984, p: -0.30465139280630354
[ Info: LossSum[700] - Loss: 1.59984, p: -0.36115693613792765
[ Info: LossSum[800] - Loss: 1.59984, p: -0.4481988144390695
[ Info: LossSum[900] - Loss: 1.59984, p: -0.5573190402714133
Parameters After: Params([[-0.6396949635848143]])
p: [-0.6396949635848143, 0.0, 1.0, 0.5, 0.0, 10.0, 1.0]

Actually, I found this issue comes from the wrong return value of neuralFMU(x₀; parameters=params, p=p, showProgress=true, saveat=tSave) in lossSum because the posNet is an array with same value for all time steps and velNet is an array with linear incremental values. For example, the posNet = [0.5, 0.5, 0.5, ..., 0.5, 0.5, 0.5] and the velNet = [0.0, 0.1, 0.2, 0.3, ..., 40.0], however, both of them should be like a string wave.

By the way, this only happens when doing FMIFlux.train!. It is normal if I run neuralFMU(x₀; parameters=params, p=p, showProgress=true, saveat=tSave) independently.

I am not sure whether canGetAndSetFMUstate="false" may be the possible reason that cause weird FMU solution? Because OpenModelica seems not support ME-type FMU to support this functionality even I follow the guideline of OpenModelica to enable it.

#################### Begin information for FMU ####################
        Model name:                     SpringPendulum1D
        FMI-Version:                    2.0
        GUID:                           {07765a18-d4d2-45bf-9416-dceca504316f}
        Generation tool:                OpenModelica Compiler OpenModelica v1.23.0 (64-bit)
        Generation time:                2024-08-02T16:27:51Z
        Var. naming conv.:              structured
        Event indicators:               0
        Inputs:                         0
        Outputs:                        0
        States:                         2
                0 ["mass.stateSelect", "mass.s"]
                1 ["mass.v"]
        Supports Co-Simulation:         true
                Model identifier:       SpringPendulum1D
                Get/Set State:          true
                Serialize State:        true
                Dir. Derivatives:       true
                Var. com. steps:        true
                Input interpol.:        true
                Max order out. der.:    1
        Supports Model-Exchange:        true
                Model identifier:       SpringPendulum1D
                Get/Set State:          false
                Serialize State:        false
                Dir. Derivatives:       true
##################### End information for FMU #####################

For CS-type FMU

Model：SpringPendulumExtForce1D.mo (from FMIZoo.jl) Exporting Tool：Dymola (I dircetly use the FMU from FMIZoo.jl) Result：the loss keeps increasing so the result is totally on the contrary like below

Parameters Before: Params([[0.1]])
[ Info: LossSum[1] - Loss: 181.8703, p: 0.1
[ Info: LossSum[100] - Loss: 201.47267, p: 0.00014190570528661807
[ Info: LossSum[200] - Loss: 222.8067, p: -0.10304300700040245
[ Info: LossSum[300] - Loss: 245.72902, p: -0.20853901980968
[ Info: LossSum[400] - Loss: 270.22067, p: -0.3160940123411269
[ Info: LossSum[500] - Loss: 296.40281, p: -0.42552168988779004
[ Info: LossSum[600] - Loss: 324.23064, p: -0.5368072061518109
[ Info: LossSum[700] - Loss: 353.77287, p: -0.6497765138005717
[ Info: LossSum[800] - Loss: 384.96156, p: -0.7641253579239035
[ Info: LossSum[900] - Loss: 417.84708, p: -0.87971473337163
[ Info: LossSum[1000] - Loss: 452.41874, p: -0.9964353941757557
Parameters After: Params([[-0.9964353941757557]])
p: [0.5, -0.9964353941757557, 10.0, 1.0, 1.0, 0.0]

If I negtivelize the return value of loss, the training is relatively normal but I think this is not a good way to do

Parameters Before: Params([[0.1]])
[ Info: LossSum[1] - Loss: 40.81055, p: 0.1
[ Info: LossSum[200] - Loss: 25.41111, p: 0.28985541311617846
[ Info: LossSum[400] - Loss: 14.79257, p: 0.4581509689040884
[ Info: LossSum[600] - Loss: 7.93092, p: 0.6032038680923282
[ Info: LossSum[800] - Loss: 3.82994, p: 0.7242404129508137
[ Info: LossSum[1000] - Loss: 1.60288, p: 0.8213334851987159
[ Info: LossSum[1200] - Loss: 0.54577, p: 0.8955987417197941
[ Info: LossSum[1400] - Loss: 0.12857, p: 0.9491589358950395
[ Info: LossSum[1600] - Loss: 0.01049, p: 0.9853109597389106
[ Info: LossSum[1800] - Loss: 0.00367, p: 1.0079643085069332
[ Info: LossSum[2000] - Loss: 0.02329, p: 1.0210428955494268
Parameters After: Params([[1.0210428955494268]]) 
p: [0.5, 1.0210428955494268, 10.0, 1.0, 1.0, 0.0]

Model：SpringPendulumExtForce1D.mo (from FMIZoo.jl) Exporting Tool：OpenModelca v1.23.0 (64-bit) Result：good, parameter can be tuned correctly after training

Conclusion

There is two issues when using FMUParameterRegistrator：

Wrong FMU output during FMIFlux.train! for ME-type FMU
Contrary update direction of loss during training for CS-type FMU

If more information is needed, please tell me, thank you!

ThummeTo commented 3 months ago

Thanks for the issue, this is very interesting! For canGetAndSetFMUstate="false" the default fall-back is to use sampling and finite differences. We will check that!

awecefil commented 3 months ago

Hi @ThummeTo , I doubt that the first issue not only happened when using FMUParameterRegistrator. I also do experiment about learning unknown effect example like the example of hybrid_ME on SpringPendulum and I get worse result

# NeuralFMU setup
numStates = fmiGetNumberOfStates(referenceFMU)
net = Chain(x -> referenceFMU(x=x, dx_refs=:all),
            Dense(numStates, 8, identity),
            Dense(8, 8, tanh),
            Dense(8, numStates)
            )
optim = Adam()
solver = Tsit5()
neuralFMU = ME_NeuralFMU(referenceFMU, net, (tStart, tStop), Tsit5(); saveat=tSave)

solutionBefore = neuralFMU(x₀; parameters=init_params, saveat=tSave)
posNet_before, velNet_before = extractPosVel(solutionBefore)

paramsNet = Flux.params(neuralFMU)
params_before = deepcopy(paramsNet)

for i in 1:length(paramsNet[1])
    if paramsNet[1][i] < 0.0 
        paramsNet[1][i] = -paramsNet[1][i]
    end
end

NUMSTEPS = 1000
GRADIENT = :ReverseDiff
FMIFlux.train!(lossSum, neuralFMU, Iterators.repeated((), NUMSTEPS), optim; gradient=GRADIENT, cb=()->callb(paramsNet), printStep=false)

[ Info: LossSum[0] - Loss: 125.24109
[ Info: LossSum[1] - Loss: 124.58794
[ Info: LossSum[100] - Loss: 53.56777
[ Info: LossSum[200] - Loss: 1.509
[ Info: LossSum[300] - Loss: 0.81575
[ Info: LossSum[400] - Loss: 0.76544
[ Info: LossSum[500] - Loss: 0.76557
[ Info: LossSum[600] - Loss: 0.76563
[ Info: LossSum[700] - Loss: 0.76606
[ Info: LossSum[800] - Loss: 0.76679
[ Info: LossSum[900] - Loss: 0.76561

It seems like it easily stucks at local optimum and the result is show below

The difference of calling neuralFMU(x₀; p=p, showProgress=true, saveat=tSave) in and out FMIFlux.train!() is below, where in FMIFlux.train!() means the calling of neuralFMU(...) is in loss function during the training process and out FMIFlux.train!() means calling neuralFMU(...) outside the training process like solutionBefore = neuralFMU(x₀; saveat=tSave)

this plot shows that the FMU output in the training process is linear, which is weird I think

NOTE: the fmu I use in above results is exported by OpenModelica v1.23.0 (64-bit), which may be the main reason but not sure why

ThummeTo commented 1 month ago

Can you compare the gradients? Like applying ForwardDiff.gradient(loss, p) (or ReverseDiffGradient(loss, p)). This way we know that it's not by the accumulated update direction of Adam.

ThummeTo commented 1 month ago

There are some tests in FMISensitivity.jl that you can check out: FMISensitivity tests

(here, we compare many gradients to ground truth gradients)

ThummeTo / FMIFlux.jl

Contrary loss update direction of CS-FMU and changeless loss of ME-FMU with FMUParameterRegistrator #142

Conclusion