Jacobian generation is very slow in the NB even for small models of the ScalableTestGrids library, sparsity pattern not generated

casella commented 1 month ago

Please check the ScalableTestGrids NB report. Starting from Type1_N_2_M_2, which has about 10,000 equations, the NB time blows up, due to the Jacobian phase taking an inordinate amount of time.

Additionally, the compilation log reports the following warning at the end of the backend phase:

Warning: NSimJacobian.SimJacobian.create could not generate sparsity pattern.

Not sure if the two things are related, but I guess so, since dense Jacobians are going to be quite large even for such moderately sized problems.

perost commented 4 weeks ago

I had a look at this (or rather was looking into #12978, which led here), and the majority of time is spent in NBStrongComponent.updateDependencyMap called from NBStrongComponent.collectCrefs: https://github.com/OpenModelica/OpenModelica/blob/33db447f72387db4d4158b7b113b6aa8cab96cfd/OMCompiler/Compiler/NBackEnd/Classes/NBStrongComponent.mo#L956-L960

For the N_2_M_4 model we call this function 21525 times, and in many of these we get 11174 dependencies as the second argument. These are then looked up in a hash table to get their dependencies, which are added to a set: https://github.com/OpenModelica/OpenModelica/blob/33db447f72387db4d4158b7b113b6aa8cab96cfd/OMCompiler/Compiler/NBackEnd/Classes/NBStrongComponent.mo#L972-L983

In total this calls UnorderedSet.add about 8.2×10⁸ times, but the sets produced are usually no larger than 48 elements because it's just adding the same crefs over and over.

I don't really know how to improve this though, the function itself seems sensible enough but because everything seems to depend on everything else it just blows up. Maybe @kabdelhak or @phannebohm has some better clues on what to do about this.

casella commented 3 weeks ago

Thanks @perost for the analysis, it's really good to know where to look at.

I'm not much into the algorithm, but I guess an algorithm that adds the same elements to a set several million times is not a particularly smart algorithm 😅. I hope this can be improved 😃

phannebohm commented 3 weeks ago

The component in question is a linear algebraic loop of size 2920.

Most of the variables come from Complex records and many have bindings with crefs. I'm not sure why these bindings are not removed by the Alias module.

Furthermore the component has only scalar equations of which most are alias equations of record elements. These should have been removed by the Alias module as well...

So the algebraic loop should be much smaller in the first place.

Anyway, from what I can see each variable appears in 2-8 equations, so there appears to be no huge dense subsystem that could justify the large number of calls.

I'm now changing the first step (the way we collect all the crefs in the residuals) to use a set instead of a list, so that we remove duplicates as soon as we can. Also the check for state-ness is currently done for each iteration variable in the component which we can do only once before updating each dependency.

That should hopefully speed up the whole thing enough. The rest is probably so slow because it's all scalar equations which we can't do anything about.

casella commented 3 weeks ago

The component in question is a linear algebraic loop of size 2920.

This is the system that spans the whole transmission network from the boundaries of the synchronous generators, which are determined by state variables in the synchronous machine models. It includes all the equations of transformers, transmission lines, and loads of the system, as well as current balance equations at the connection nodes, all phasor-based implicit algebraic equations. Most of them are linear, only the PQ loads are nonlinear due to the current*voltage terms.

I tried to compile the model in ODE mode, so I can see those algebraic loops explicitly in the debugger, using the old backend

model Test
  extends ScalableTestGrids.Models.Type1.Type1_N_2_M_2;
  annotation(__OpenModelica_commandLineOptions = "-d=execstat --tearingMethod=minimalTearing",
             __OpenModelica_simulationFlags(nls="kinsol", lv="LOG_STATS"),
             experiment(StopTime = 15, Tolerance = 1e-4));

end Test;

There is a strong component with 1164 equations for initialization, one with 1059 equations for lambda = 0 initialization, and a slightly smaller one with 1114 equations for simulation. So it seems that alias elimination & Co. should reduce the size of that strong component by about a factor 3.

BTW, many of those equations are linear equation y = a*x+b; extended linear aliasing could bring that number further down by a significant factor. What is the status of that feature?

Most of the variables come from Complex records and many have bindings with crefs. I'm not sure why these bindings are not removed by the Alias module.

Me neither. Mabye this is connected to #12880?

Furthermore the component has only scalar equations of which most are alias equations of record elements. These should have been removed by the Alias module as well...

Ditto.

The equations are scalar because these models contain many individual instantances of component models, as would be the case of real-life grid models. The idea is to use -d=mergeComponents so the frontend will merge them into (large) arrays, so we'll only have one array of transformers, one array of transmission lines, and one array of PQ loads. These should be the majority of those 3000 equations. The connection equations instead will remain scalar, because real-life grids have an irregular structure, so it is not possible to collect them into for-loops.

I have added one task to the heavy-test job with -d=mergeComponents activated, I'll run it ASAP.

So the algebraic loop should be much smaller in the first place.

Yes, see above.

Anyway, from what I can see each variable appears in 2-8 equations, so there appears to be no huge dense subsystem that could justify the large number of calls.

Exactly. Dually, each equation will refer to a handful of variables as well. The system is sparse and the time for code generation, including Jacobians should scale as O(N) in the worst case.

I'm now changing the first step (the way we collect all the crefs in the residuals) to use a set instead of a list, so that we remove duplicates as soon as we can. Also the check for state-ness is currently done for each iteration variable in the component which we can do only once before updating each dependency.

Good.

That should hopefully speed up the whole thing enough. The rest is probably so slow because it's all scalar equations which we can't do anything about.

This will be mostly fixed by running the test with -d=mergeComponents. I hope that having mixed array and scalar equations is not a problem from this point of view. Can you please confirm?

Thanks!

phannebohm commented 3 weeks ago

BTW, many of those equations are linear equation y = a*x+b; extended linear aliasing could bring that number further down by a significant factor. What is the status of that feature?

We have models in the testsuite that already use affine aliasing. See e.g.: testsuite/simulation/modelica/NBackend/alias/Alias.mos.

For some reason the Alias module is just not applied to this set of equations at all. I'll investigate why, after my other fixes are done.

That should hopefully speed up the whole thing enough. The rest is probably so slow because it's all scalar equations which we can't do anything about.

This will be mostly fixed by running the test with -d=mergeComponents. I hope that having mixed array and scalar equations is not a problem from this point of view. Can you please confirm?

I don't see why it should be a problem.

casella commented 3 weeks ago

I just started job #8498 so we can see what happens when we use -d=mergeComponents on ScalableTestGrids examples.

casella commented 3 weeks ago

We have models in the testsuite that already use affine aliasing. See e.g.: testsuite/simulation/modelica/NBackend/alias/Alias.mos.

Great!

For some reason the Alias module is just not applied to this set of equations at all. I'll investigate why, after my other fixes are done.

Good. I think it will help a lot speeding up the simulation, if you can remove all those equations from the iteration loop of IDA.

This will be mostly fixed by running the test with -d=mergeComponents. I hope that having mixed array and scalar equations is not a problem from this point of view. Can you please confirm?

I don't see why it should be a problem.

OK. Let's see what happens with the tests.

casella commented 3 weeks ago

@phannebohm has a solution in the works that should be more efficient. Hopefully done this week.

phannebohm commented 2 weeks ago

I've got it from this

Notification: Performance of Jacobian: time 80.95/99.68, allocations: 28.01 GB / 34.35 GB, free: 0.5078 GB / 2.136 GB

down to this

Notification: Performance of Jacobian: time 0.1506/19.85, allocations: 75.5 MB / 6.411 GB, free: 32.67 MB / 2.05 GB

so the issue should be fixed as soon as #13019 is merged.

casella commented 2 weeks ago

Great! 🎉

Please ping me on Slack when #13019 is merged in, so I can set up and run a sweeping heavy_test job run. Thanks!

casella commented 2 weeks ago

Heavy-test job started #8542

casella commented 1 week ago

@phannebohm I'm not sure how you obtained your results but I checked both the newInst-newBackend and heavy-test reports of ScalableTestGrids, and in both cases the issue is still present.

Are you sure you pushed in all the local changes that brough your Jacobian time down from 80 to 0.16 s?

phannebohm commented 1 week ago

I see here

Notification: Performance of Jacobian: time 0.1637/17.17, allocations: 125.7 MB / 6.578 GB, free: 337.3 MB / 2.14 GB

so to me it looks like it worked.

phannebohm commented 1 week ago

What remains is that the sparsity pattern is not generated:

Warning: NSimJacobian.SimJacobian.create could not generate sparsity pattern.

casella commented 1 week ago

I see here

Notification: Performance of Jacobian: time 0.1637/17.17, allocations: 125.7 MB / 6.578 GB, free: 337.3 MB / 2.14 GB

OK, it already took 80 s for Type1_N_2_M_2, I did not remember it was that bad.

so to me it looks like it worked.

Indeed 😃.

What remains is that the sparsity pattern is not generated

This is now going to cause scaling issues on much large sizes: Type1_N_2_M_2, which has 10,000 equations, takes 0.16 to generate the Jacobian, while Type1_N_4_M_4, which has 50,000 equations, takes 4.5 s, with an O(N^2) scaling, which is inevitable if you generate dense Jacobian. I'll open a new ticket for that.

OpenModelica / OpenModelica

Jacobian generation is very slow in the NB even for small models of the ScalableTestGrids library, sparsity pattern not generated #12937