centreformicrosimulation / SimPaths

SimPaths is an open-source microsimulation framework for life course analysis, developed and maintained by CeMPA at the University of Essex
European Union Public License 1.2
7 stars 17 forks source link

Experimental/config parsing #43

Closed andrewbaxter439 closed 11 months ago

andrewbaxter439 commented 12 months ago

This is a highly experimental branch for now. It should allow a user to provide a config.yml file which reliably, recordably, repeatably passes the same arguments to SimPathsMultiRun and internal SimPathsModel runs. Can handle all parameters shown in the SimPaths GUI and allow these to be specified external to the java code. Ideally could mean that any scenario/run could be carried out from a static released version of SimPaths. Will tidy and follow up with more explanation.

andrewbaxter439 commented 11 months ago

Hi @pbronka - after a little experimenting I think this is working how I had imagined it to. SimPathsMultiRun can now take a config.yml file (or a file explicitly passed via -config) and read these in as arguments. As an example file:

# This file can be used to override defaults for multirun arguments.
# These will be overridden by command-line arguments

maxNumberOfRuns: 3
executeWithGui: false
randomSeed: 12345
startYear: 2018
endYear: 2020
popSize: 25000

model_args:
    alignFertility: false
    alignCohabitation: false
    alignEmployment: false

collector_args:
    exportToCSV: true
    persistStatistics: true
    persistStatistics2: true
    persistPersons: false
    persistBenefitUnits: false
    persistHouseholds: false

This will be read in and change the SimPathsMultiRun parameters to the top set of parameters, change SimPathsModel to not align Fertility/Cohabitation/Employment, and set SimPathsCollector to not persist Person/Households/BenefitUnits csvs (whilst persisting statistics).

This shouldn't take affect until a valid config file is passed with non-default arguments. It works by reflection, attempting to coerce each passed argument to the type of the declared parameter in each case - which should be safe!

This should work exactly like changing the parameters in Java code before compiling. But I think it offers some advantages:

Do let me know if this seems rational/useful at all. Have been playing with this setup and it's a very helpful default to be able to turn on/off statistics and other csv outputs quickly.

pbronka commented 11 months ago

Thanks @andrewbaxter439 - looks great.

A fresh, customised run could be started by cloning repo, compiling code, copying in input/config files and running SimPathsStart/SimPathsMultiRun all in one script.

This sounds a lot like a Docker container? Or do you have a different solution in mind?

andrewbaxter439 commented 11 months ago

This sounds a lot like a Docker container? Or do you have a different solution in mind?

Yes Docker container could be one use - a Dockerfile which copied in the customised 'input' folder and config file and set seed to an environment variable in my mind could allow several parallel containers with different starting points. I also have in mind the potential for running across a cluster of machines and AFAIK initialising them with a script and a custom start seed would be the most robust way of doing this. I don't have direct experience of this yet but I figure this could be a fool-proof way of building in that capability.