Improve APSIM run-time speed

hol353 commented 2 years ago

I'm capturing my thoughts before going on leave next week. Feel free to comment though.

I have been playing with a prototype of a faster way of running simulations. After profiling, it is evident that perparing a simulation for running (e.g. cloning the simulation, compiling manager scripts, resolving links and events) is expensive (30% of the run time). For experiment nodes (and eventually morris, sobol, croptimizer) I thought I would try and use the APSIM Server approach where a simulation is prepared once and then reused, once for each factor (replacement). This is MUCH quicker - about 3 times quicker. My test case is running the 2000 NPI simulations in the Wheat validation. The tip revision takes about 4.5 minutes to run this - my prototype takes about 1.5 minutes to run.

As always there is a catch. To reuse a simulation like I propose above, we need to be very careful the state (fields & properties) of every model in the simulation is reset (zeroed) at the start of every run. This is the reason why the tip revision clones a simulation before each run to get an instance in a consistent state. APSIM 7.10 has functions to zero variables for this reason and we should probably do this as well. The annual plant models should already do this because they need to zero their state at harvest but I suspect this isn't working 100% which will cause hidden problems when sowing crops in multiple years. We can write unit tests that ensure the zero methods work by creating an instance of every class in the Models assembly, setting the values of every private and public field/property to a magic number, call the zero method and then check that every field/property has had its value zeroed (or nulled). Currently we run the examples and check model state before and after a run but this isn't enough.

While looking at the way we currently run simulations, I've come to realise that the running code needs refactoring. Below are some thoughts for a better design.

If you're keen to have a look at where I've got to, my branch is here: hol353/fast-wheat

I have created a class (that implements IRunnable) that runs a factorial of simulations. It contains code that was previously in SimulationDescription.cs and Simulation.cs. The class is currently called SimulationRunnable - need better name.
I have created a CLI program (ConsoleApp1) that exercies this new class using Tests\Validation\Wheat\WheatNPI.apsimx. You can't currently run anything from the GUI. My new code isn't plumbed in yet.
The above CLI simply uses JobManager and JobRunner to run 4 instances of SimulationRunnable. I wonder though whether we still need JobManager and JobRunner. We could just use Parallel.ForEach instead. Much simpler.
Next Step: Create static methods somewhere that return an IEnumerable of things to run for a given .apsimx file name (or a relative IModel). The returned IEnumerable needs to include a 'RunParallel' instance that runs all SimulationRunnable instances asynchronously followed by a Sequential instance that then runs the post simulation tools. Code can be taken from SimulationGroup.cs. Because an IEnumerable is returned, the calling code can remain simple and just run it using Parallel.ForEach. The calling code doesn't need to know about what is being run. This simple mechanism would replace the complicated implementation in Runner.cs which traps events to determine when something has finished running, propogates the events and then sends out more events to indicate when all jobs have finished.

hut104 commented 2 years ago

Programmers can be slack in writing zeroing methods. Is there a way that we can automate some of this? Either the writing of the zero method, or the unit test?

hol430 commented 2 years ago

After profiling, it is evident that perparing a simulation for running ... is expensive (30% of the run time)

Worth noting that if you make a longer simulation (ie by adding more crops/zones/years/animals), the "startup" cost won't increase by as much as the simulation runtime, and you see a smaller proportion of time spent in init code.

We could automate the zeroing methods with reflection but I'm not sure of the performance implications of this. It may end up not being much faster than doing a binary clone. If so, perhaps there's some other metaprogramming approach which would suit.

lie112 commented 2 years ago

To get this working requires all models to identify and reset required values on a early presimulation clock event. So to get it implemented each developer has to identify the properties and values that need to be set for the start of the simulation. The question is how to do this. As it's a the same logic that's performed by APSIM somewhere this might be another case of an Attribute on properties. I assume any property with a Description Attribute is not considered as it's value comes form the serialisation of the simulation json file, and other properties may not hold persistent values in the logic, so the number of properties that need resetting might be low. So a StartupInitialValue or the like as an attribute that can be set by walking the model tree before starting the simulation a 2nd+ time might work. If this attribute was a required it would force developers to build in this logic at time or writing and would ensure they were added to properties of manager scripts as well. If this does result in significant performance increases it would become part a standard requirement for properties. The only problem is this also relates to fields and makes a mess from this point. Therefore the other option of all models having a ResetVariables method would require all developers to be aware of this critical need and add this new Method to their models and ensure everything can be reset to clean start. that's my thoughts. Happy to apply this to CLEM when needed.

hol430 commented 2 years ago

We would want to be careful about using reflection (ie code attributes-based approach), as performance is critical here. Some profiling might be required here, but if reflection turns out to not be a bottleneck, then yes that probably would be the simplest option to implement. The other way of doing this would be to have some sort of test which runs as part of jenkins builds and ensures that all models are resetting themselves correctly. It would then be up to the developer to reset everything manually, in an efficient manner. We could be equally confident that things are working, because any accidental breakages would be caught by the test set.

APSIMInitiative / ApsimX

Improve APSIM run-time speed #6997