APSIMInitiative / ApsimX

ApsimX is the next generation of APSIM
http://www.apsim.info
Other
132 stars 161 forks source link

Improvements needed in Experiment node in GUI #7419

Open hol353 opened 2 years ago

hol353 commented 2 years ago
lie112 commented 1 year ago

Some comments, suggestions and requirements for future direction/refactor of Experiment

An improved approach for providing details of all simulations required for the threading simulation manager and any other analysis tools is required. This would implement a standardised format (db table/json text) used to build a simulation model as a copy of the current simulation. This information could be provided manually, built dynamically by the APSIM component from the tree structure, derived from input data such as Excel, or created from an R script. This is currently performed by Experiment which wraps the Simulation model inside the Simulations model.

Note: this approach will also include any changes to the future of model serialisation needed for threading across altered property values.

This is my understanding and requirements of CLEM projects for a new style of experiments for comment and to get planning underway. I won’t discuss modifying the current code as this as is tricky without a full understanding of the full extent of current uses and detailed knowledge of all functionality and would be difficult to implement in one go for all users. As I proceed I may come back to finding Experiment is already the best approach and needs some additional functionality added. All refactoring would be independent of the current system allowing continued uninterrupted use before the old Experiment is finally depreciated. This necessitates renaming some components which may be beneficial. I will try to simplify the approach and make it easy for users to understand in this process. Let me know what programming styles I violate.

Simulations is a well-used and necessary holder of all simulations in an apsimx file so remains Currently APSIM next gen runs each Simulation in Simulations by using each Simulation or the list of simulations (with factor changes) from the Experiment that wraps a base simulation. It would be good to get back some old functionality that allowed the base sim in an experiment to be run like any simulation with results and graphs available, or running a single instance from the list of all simulations in an experiment as a non-experiment run. So being able to run an individual simulation from a mutli simulation system and see the output is very useful. Could the working of Experiment be provided in a new model (e.g. MultipleSimulation) that implements the ISimulation interface. This model would require a child Factors model that manages the work of current Factors and Permutations. If empty this simulation would function the same as a Simulation and run the base simulation. Factors therefore sits as a top level model along with Summary and DataStore. For simplicity all this logic could be built into Simulation with the addition of Factors model into the tree adding all the multi simulation functionality of previous experiment. When running all simulations each Simulation is therefore asked for a serialised List to run.

Factors Factors model, optional under Simulation, would create the Sqlite db table of all simulations needed, or be able to read this from a supplied filename. Do we have code to build a simulation and modify the specified properties and include replacement models as specified (I think this is currently managed by Factor, Experiment and Permutations?) I’m not sure how this includes fully parameterised components such as Weather provided as factor values unless we simply break these down into a composite factor with all properties provided individually. Then we come to nested models which comes to the problem of copying models with serialisation and storing the details in the most optimised approach. A component could therefore be provided in the database a as json text and built as needed when creating the simulation (I think this is the current desire but not implemented due to lack of infrastructure code). This is really a user interface aspect where dropping a model on the composite factor could create a composite factor entry with all properties auto filled. The question is which model’s responsibility is it to build the simulation to pass to the manager to run? The current approach also keeps a separate list of simulations from factors allowing the user to enable/disable these before running, thus the ExperimentView, but Simulation view could also handle this. But there is no easy way to visualise or adjust all the factor values.

The factor types needed and already included Factor – e.g. PropertyA from 1 to 100 step 10 CompositeFactor – providing a list of properties [ModelA].ModelB.Property = Value for each factor level. This is where data can be provided from a table or code to help building the simulations. Composite factor where each level provides a list of models

Permutation would be handled by the user adding a nested structure such that whenever a factor is nested below another this will create all combinations in the final simulations performed. Therefore, simple multi runs through all levels of factors provided will continue to work, and as soon as a nested factor structure is included you introduce permutation with multiple runs at that factor level alone, allowing for a mixture of permutation and simple level multiple simulations to the users requirements. This is also the opportunity to think about how data from multiple simulation runs is visible in the UI and whether reports (tables in database) and graphs in the base simulation can be viewed based on simulation names alone, or as a combined graph with all simulations (This is the issue with accessing Zone which is identical in each experiment). This would avoid the need for experiment level graphs and reports effectively excluding any simulation level graphs and reports.

Summary • Multi simulation functionality would therefore be added to any simulation by adding a Factors model at the top level. • The Factors resulting in changes to a simulation work similar to present but may benefit from new thinking on serialisation and dynamic building of simulations. • Nesting of factors determines permutations • Might still need to think about separating the naming of factor levels versus providing properties and values that change for that level. The model mane does provide a very useful way of providing • A new style of Factor model could dynamically build the nested factor models in the tree structure from input files based on standards or an excel spreadsheet. These would be implemented as the range of simulations are built for the run manager. • The simulation code needs access to functionality to build a new copy of itself by being provided a table (or json text) of all properties that need to be changed and the list of level names this represents.

How different is this to what is currently provided? Where can improvements be made in better user control of the complicated factorials. For example, how can we provide values for a property at a factor level based on the value of a parent factor level by either using the level name? For example [Crop].Filename = cropfile_{Soil}.csv where the {Soil} is replaced by the current level name (e.g. “BlackSoil”) of the Soil factor, or fills values to use based on table format with knowledge of the value of all parent factors in a permutation.

hol353 commented 1 year ago

Thanks @lie112. I'm unsure how much of the above we can already do and what else needs to be done. I agree that some more flexibility is needed in configuring experiments and factors. I like the idea of being able to do that from a .csv or spreadsheet. We could also allow the user to work with a table view of the factors in the GUI, not just from a spreadsheet.

The experiment / factor code was simplified several months ago but still has a way to go. The factors are used by Morris and Sobol and the graphing code which seriously complicates the design and makes it hard to follow. What is needed is to:

  1. Do some further refactoring to encapsulate the experiment factors code, potentially into a single class.
  2. From the graphing code, remove the low level knowledge of how factors work. The graphs need to know about factors (for the Vary by capability) but it should just call a method in a class to get a list of factor names.
  3. Add ability to describe factors using a table mechanism, retaining the existing “factors | permutation | factor value | composite factor” approach.

In the New Year it might be good to capture some short, 1 sentence user stories (use cases) for factor/permutation functionality.