econ-ark / HARK

Heterogenous Agents Resources & toolKit
Apache License 2.0
336 stars 198 forks source link

tools for writing out parameters to a Dolang configuration file #446

Closed sbenthall closed 4 months ago

sbenthall commented 4 years ago

Simulations built with HARK currently start with long sections of Python code setting parameters.

This custom code is written in a few different styles; it would be cleaner to have these parameters in a configuration file. This is what Dolang does, and is a pattern it would do well to get to.

This also would be a step in the direction of automating the creation of tables for showing the mapping between notation and Python variables.

llorracc commented 4 years ago

As with other things, this is something that is best handled by crafting a single "template" example and then adapting/improving the template by thinking about the extent to which the template could be used for various different projects.

I think good candidates would be documentation examples; you might try ConsPortfolioModelDoc, for example.

sbenthall commented 4 years ago

Notes from the sprint meeting this morning:

I'll work on a demo of ways to work with this after ofter some other directory cleanup (i.e. #440 )

sbenthall commented 4 years ago

I've been doing research into how other scientific simulation libraries handle this problem. What I've found is that there is no standardized way of doing it yet, but there's some patterns that seem to hold across the libraries.

These are the libraries I included in my survey, with some notes about how each managed parameters in their examples.

PySB

Systems Biology Modeling http://pysb.org/ --- Parameters coded into examples. No visible inheritance.

Mesa

Agent Based Modeling https://pypi.org/project/Mesa/ https://forum.comses.net/t/mesa-an-agent-based-modeling-framework-in-python-3/7039

--- Individual examples have their own requirements.txt --- models have default parameters in initmethod of the model class

ActivitySim

Metropolitan Travel Activity https://activitysim.github.io/

-- Many, many configuration options -- Everything provided in .csv or .yaml files, e.g.: https://github.com/ActivitySim/activitysim/tree/master/example/configs

SimuPy

Dynamic systems https://github.com/simupy/simupy https://readthedocs.org/projects/simupy-personal/downloads/pdf/latest/ --- Parameters coded into each example file --- No reuse -- library is immature

Nengo

Brain simulations https://www.frontiersin.org/articles/10.3389/fninf.2013.00048/full --- examples are all notebooks in the docs directory https://github.com/nengo/nengo/tree/master/docs --- parameters are all just entered as arguments in (very lightweight) modeling interface, e.g.: https://github.com/s72sue/std_neural_nets/blob/master/hopfield_network.ipynb

nilearn

Neuro imaging http://nilearn.github.io/auto_examples/index.html#tutorial-examples --- datasets are loaded by a data loading handler --- many examples, with few parameters, which are hardcoded as method arguments E.g. https://github.com/nilearn/nilearn/blob/master/examples/04_manipulating_images/plot_roi_extraction.py

Special mention:

Yggdrasil

Plant simulations https://academic.oup.com/insilicoplants/article/1/1/diz001/5479575 https://github.com/cropsinsilico/yggdrasil Software for combining models across programming languages to accommodate different layers of abstraction.

llorracc commented 4 years ago

Sounds like you did an admirably comprehensive job at looking at other libraries.

I guess your next step should be to propose lessons, in the form of suggestions for how we should do this systematically?

On Fri, Dec 13, 2019 at 4:53 PM Sebastian Benthall notifications@github.com wrote:

I've been doing research into how other scientific simulation libraries handle this problem. What I've found is that there is no standardized way of doing it yet, but there's some patterns that seem to hold across the libraries.

  • None of these libraries has anything as tightly integrated as REMARKs currently are for publications. The examples provided with the core libraries vary in how 'complete' they are as useful demos or exploratory tool, but I don't see any submodules or linking across repositories.
  • There are almost never parameters hard-coded into the library itself. Most of the time, these are coded into the python of an example notebook or python file on a case-by-case basis. There are a couple exceptions to this:
    • ActivitySim has many, many parameters for its simulations; it stores these in .yaml and .csv files
    • Mesa has substantive model classes that are initialized at the start of particular simulations or experiments. These have their default parameters loaded as default arguments to the class initializer and sometimes stored in static variables of the class itself.

These are the libraries I included in my survey, with some notes about how each managed parameters in their examples. PySB

Systems Biology Modeling http://pysb.org/ --- Parameters coded into examples. No visible inheritance. Mesa

Agent Based Modeling https://pypi.org/project/Mesa/

https://forum.comses.net/t/mesa-an-agent-based-modeling-framework-in-python-3/7039

--- Individual examples have their own requirements.txt --- models have default parameters in initmethod of the model class ActivitySim

Metropolitan Travel Activity https://activitysim.github.io/

-- Many, many configuration options -- Everything provided in .csv or .yaml files, e.g.: https://github.com/ActivitySim/activitysim/tree/master/example/configs SimuPy

Dynamic systems https://github.com/simupy/simupy https://readthedocs.org/projects/simupy-personal/downloads/pdf/latest/ --- Parameters coded into each example file --- No reuse -- library is immature Nengo

Brain simulations https://www.frontiersin.org/articles/10.3389/fninf.2013.00048/full --- examples are all notebooks in the docs directory https://github.com/nengo/nengo/tree/master/docs --- parameters are all just entered as arguments in (very lightweight) modeling interface, e.g.:

https://github.com/s72sue/std_neural_nets/blob/master/hopfield_network.ipynb nilearn

Neuro imaging http://nilearn.github.io/auto_examples/index.html#tutorial-examples --- datasets are loaded by a data loading handler --- many examples, with few parameters, which are hardcoded as method arguments E.g.

https://github.com/nilearn/nilearn/blob/master/examples/04_manipulating_images/plot_roi_extraction.py

Special mention: Yggdrasil

Plant simulations https://academic.oup.com/insilicoplants/article/1/1/diz001/5479575 https://github.com/cropsinsilico/yggdrasil Software for combining models across programming languages to accommodate different layers of abstraction.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/econ-ark/HARK/issues/446?email_source=notifications&email_token=AAKCK75H243SCRV5PQUX3JTQYP75JA5CNFSM4JV6MI72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG3LLKY#issuecomment-565622187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCK74K2ZNUSHF22ZSJAQLQYP75JANCNFSM4JV6MI7Q .

--

sbenthall commented 4 years ago

The main lessons from this work are:

I will make a PR with a demonstration of how this could work with the HARK core and a template example.

One thing that occurred to me after I did the survey of simulation libraries, but which I think is important, is this:

This may be a more complex issue, better dealt with in a separate task. But I wanted to flag for future work the possibility of:

I've noticed that Dolo configuration files do separate meaningful categories of parameters from each other, which I think helps add clarity.

albop commented 4 years ago

So, back to business after a long teaching span. Indeed in dolo the choice was made to completely separate the model part which is in the yaml part from solution instructiond which are in the python code. However the separation is not that strict and one of the reasons there are no command options in the yaml file is is their lack so far of api stability. I want the yaml filed to stay. A certain degree is separation is probably a good idea. I'd suggest to check the toml language. It looks nice and simple. I didn't adopt it because it isn't great to input equations (you need quoted everywhere)

On Sun, Dec 15, 2019, 9:15 PM Sebastian Benthall notifications@github.com wrote:

The main lessons from this work are:

  • If there are a large number of parameters, it makes sense to put them in a serial configuration file, like a .yaml
  • If there are substantive models, it makes sense for default parameter values to be loaded by the model's class when it initializes.

I will make a PR with a demonstration of how this could work with the HARK core and a template example.

One thing that occurred to me after I did the survey of simulation libraries, but which I think is important, is this:

  • The libraries I looked at are mainly about defining a model's content by giving it parameters, and delivering simulated output.
  • Some HARK parameters are actually more about how the model is executed, which is quite a bit different from model content. Maybe this should be treated differently. This would imply a comparison with a different set of Python libaries that emphasis model-fitting more, such as scikit-learn and PyMC3.

This may be a more complex issue, better dealt with in a separate task. But I wanted to flag for future work the possibility of:

  • Distinguishing, when defining parameters, between those that are for the model's substantive content (like CRRA and DiscFac), and parameters that guide how it works operationally, perhaps like CubicBool.

I've noticed that Dolo configuration files do separate meaningful categories of parameters from each other, which I think helps add clarity.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/econ-ark/HARK/issues/446?email_source=notifications&email_token=AACDSKMZBPXUSPI7MUKEAE3QY2F5RA5CNFSM4JV6MI72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG5BGDI#issuecomment-565842701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDSKNV25KX3BM2TOPIUUTQY2F5RANCNFSM4JV6MI7Q .

llorracc commented 4 years ago

(Replaced this comment with markdown version below)

llorracc commented 4 years ago

Pablo,

So, your suggestion would be that we adopt toml as our standard for defining input files of all kinds (esp. parameter files)?

For example, right now we have a giant ConsumerParameters.py file which we use with commands like

from ConsumerParameters import init_perf_foresight as PerfForesightDict

PFexample = PerfForesightConsumerType(**PerfForesightDict)

(and the relevant part of ConsumerParameters.py is excerpted below).

We’ve had some discussions about alternative ways of doing this including:

  1. Having default parameter values defined directly in the specification of the class itself
  2. Breaking things up so that rather than having a giant ConsumerParameters.py file for all of our consumption/saving classes, there would be a standalone file (maybe toml?) for each class, like PerfForesightCRRA.toml
  3. A combination: Define default parameter values when the class is defined, but write those parameter values out into a file (say, a toml file) so that they are easy to inspect (but make it clear to the user that the toml file is generated content).

One complexity is that we build our models by inheritance, so that for example ConsIndShockType model inherits the characteristics of PerfForesightType, and we would want it to inherit default parameter values too, which would argue for options 1. or 3. above.

My inclination is for 3, but am curious about your thoughts.

 ConsumerParameters.py:

CRRA = 2.0                          # Coefficient of relative risk aversion
Rfree = 1.03                        # Interest factor on assets
DiscFac = 0.96                      # Intertemporal discount factor
LivPrb = [0.98]                     # Survival probability
PermGroFac = [1.01]                 # Permanent income growth factor
BoroCnstArt = None                  # Artificial borrowing constraint
MaxKinks = 400                      # Maximum number of grid points to
allow in cFunc (should be large)
AgentCount = 10000                  # Number of agents of this type
(only matters for simulation)
aNrmInitMean = 0.0                  # Mean of log initial assets (only
matters for simulation)
aNrmInitStd  = 1.0                  # Standard deviation of log
initial assets (only for simulation)
pLvlInitMean = 0.0                  # Mean of log initial permanent
income (only matters for simulation)
pLvlInitStd  = 0.0                  # Standard deviation of log
initial permanent income (only matters for simulation)
PermGroFacAgg = 1.0                 # Aggregate permanent income
growth factor (only matters for simulation)
T_age = None                        # Age after which simulated agents
are automatically killed
T_cycle = 1                         # Number of periods in the cycle
for this agent type

# Make a dictionary to specify a perfect foresight consumer type
init_perfect_foresight = { 'CRRA': CRRA,
                           'Rfree': Rfree,
                           'DiscFac': DiscFac,
                           'LivPrb': LivPrb,
                           'PermGroFac': PermGroFac,
                           'BoroCnstArt': BoroCnstArt,
                           #'MaxKinks': MaxKinks,
                           'AgentCount': AgentCount,
                           'aNrmInitMean' : aNrmInitMean,
                           'aNrmInitStd' : aNrmInitStd,
                           'pLvlInitMean' : pLvlInitMean,
                           'pLvlInitStd' : pLvlInitStd,
                           'PermGroFacAgg' : PermGroFacAgg,
                           'T_age' : T_age,
                           'T_cycle' : T_cycle
                          }
sbenthall commented 4 years ago

462 is intended to demonstrate an incremental step in the right direction here.

In master, the library's default parameters are all hard-coded into ConsumerParameters.py.

In this PR, ConsumerParameters.py is still there, but when it is imported it loads all the parameters from a ConsumerParameters.yaml file.

With the exception of a few small changes, this PR could in principle be merged with no change to the API for downstream uses.

llorracc commented 4 years ago

@sbenthall, sounds like you've made a nice prototype (though I haven't had time to look at it yet).

I'd be interested in your thinking about the pros and cons of my idea from the prior discussion, of having default values embedded in the definition of the class, then written out to a yaml file. As I see it:

pro: There's one place to look both for how the parameter is used and what its default numerical value is con: The values of the parameters are scattered through the class definition instead of concentrated in one place

PS. Did you look into why Pablo suggested toml instead of yaml?

sbenthall commented 4 years ago

@llorracc Ok, I'll be honest.

I don't like the idea of having the classes write the parameters out to a yaml file, with that yaml file stored in the version control, for your "countercounterpoint" reason. I think it's confusing.

I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.

These parameters might even be displayed in the __repr__ of the class. https://www.pythonforbeginners.com/basics/__str__-vs-__repr

I think these conversations are very tricky because they often depend on quite unscientific intuitions about what's "easier to use", which is a very noisy human variable. Most of the time when I have an opinion on this, it's based on my understanding of software engineering conventions. But there's always room to disagree.

I've now looked at TOML, as Pablo recommends. It looks quite similar to YAML. I think it's less widely used than YAML. My impression is that it would be idiosyncratic to adopt it. If it isn't as good as YAML for including equations, I think that's a dealbreaker for depending on it in the long run.

https://gist.github.com/oconnor663/9aeb4ed56394cb013a20

llorracc commented 4 years ago

Thinking through my feelings on this, I guess partly they boil down to an aversion to a proliferation of different files that people have to get loaded correctly and in the right locations. This may well be a bad instinct on my part, to the extent that it is not conditioned on experience with people getting everything via a pip install or whatever. My fondness for embedding the default parameter values in the definition of the class is that tif we do it that way then hen whenever the person has a definition of the class they are guaranteed to have a definition of default parameter values. Our current setup requires them also to have ConsumerParameters.py in the right place, and your extension requires them _als_o to have a yaml file in the right place. To the extent that worrying about people having "the right files in the right place" is anachronistic on my part (because pip install or git pull will guarantee that, I'm on board with your approach.

The one other point in favor of the 'defaults within the class definition' approach is inheritance. Suppose we want ConsIndShockType to inherit all the default parameters of PerfForesightCRRAType, and only be required to specify values for values that are NOT default parameters for PerfForesightConsumerType. I don't see how we do that with your setup of standalone YAML files for each type, whereas it is inherent in my approach of classes inheriting from parent classes and ony having to define the variables that are novel.

I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.

This sounds good; but to be clear, you are proposing a new standard here, which is not yet implemented for any of our existing classes?

On Mon, Dec 23, 2019 at 7:23 PM Sebastian Benthall notifications@github.com wrote:

@llorracc https://github.com/llorracc Ok, I'll be honest.

I don't like the idea of having the classes write the parameters out to a yaml file, with that yaml file stored in the version control, for your "countercounterpoint" reason. I think it's confusing.

I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.

These parameters might even be displayed in the repr of the class. https://www.pythonforbeginners.com/basics/__str__-vs-__repr

I think these conversations are very tricky because they often depend on quite unscientific intuitions about what's "easier to use", which is a very noisy human variable. Most of the time when I have an opinion on this, it's based on my understanding of software engineering conventions. But there's always room to disagree.

I've now looked at TOML, as Pablo recommends. It looks quite similar to YAML. I think it's less widely used than YAML. My impression is that it would be idiosyncratic to adopt it. If it isn't as good as YAML for including equations, I think that's a dealbreaker for depending on it in the long run.

https://gist.github.com/oconnor663/9aeb4ed56394cb013a20

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/HARK/issues/446?email_source=notifications&email_token=AAKCK7ZU7QYDVHGY2HUFJIDQ2FI7DA5CNFSM4JV6MI72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSFNSA#issuecomment-568612552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCK7ZKTXROVBVS53X74RDQ2FI7DANCNFSM4JV6MI7Q .

--

sbenthall commented 4 years ago

As a small step: Standardize the names of the parameter dictionaries in ConsumerParameters.py with the names of the classes that use them. Then fix the downstream parameters.

sbenthall commented 4 years ago

Referring to that proposed "small step"--a problem with naming the dictionaries in ConsumerParameters.py with the class that ingests them is that many of these dictionaries are reused in multiple places at the moment.

For example, here, init_idiosyncratic_shock is imported, updated, and then given at runtime as arguments to the initializer of RepAgentConsumerType: https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsRepAgentModel.py#L340-L350

This is a good example of why it would be better for classes to have default parameters that get inherited by their subclasses. Indeed, RepAgentConsumerType is a subclass of IndShockConsumerType and if the variables were being passed through by inheritance, then only the changes values would need to be defined at runtime.

For this reason, I'm working on #466, which gives each class the parameters as overrideable defaults.

sbenthall commented 4 years ago

With #442 merged, now it's easier to see why the current way of handling parameters is problematic.

Because of the old way of handling parameters, there are downstream dependencies on a parameter file that shouldn't be in the HARK module: https://github.com/econ-ark/HARK/issues/440#issuecomment-562360408 https://github.com/econ-ark/HARK/blob/master/HARK/SolvingMicroDSOPs/Calibration/EstimationParameters.py

SolvingMicroDSOPs is now a REMARK. But it is still depending on HARK for its Calibration file. This is not right.

Whatever solution we find for parameter management within HARK will, in the best case, also inform how REMARKs work as well.

sbenthall commented 4 years ago

I think at yesterday's meeting we came to some conclusions about where to go with this. It's actually several different features, which will allow for efficient and flexible configuration.

For (c) and (d) there's questions about how specifically the YAML will be formatted. But I think it's fair to say that if a value is not specified in an (input) YAML file, it will be filled with the default value.

The idea behind (c) and (d) is to have model portability. This is quite a big lift. I'd like to scope this ticket at (a) (b) and the preliminary version of (c). The design decisions for (c) and (d) are going to require a lot more discussion.

sbenthall commented 4 years ago

The next step in this issue is to allow the configuration of a HARK model from a YAML file.

The best thing to do would be to use an existing YAML format for model configuration: Dolang! So this is related to #763

sbenthall commented 4 years ago

Since #763 covers the case of having dolang YAML input into HARK, I'm changing the scope of this ticket to be outputting HARK modles to dolang YAML.

This will depend on having an internal, functionalized version of the transition functions in HARK. So this depends on #761

sbenthall commented 3 years ago

This also depends on having an organized representation of the parameters of a model, or #660