Closed sbenthall closed 4 months ago
As with other things, this is something that is best handled by crafting a single "template" example and then adapting/improving the template by thinking about the extent to which the template could be used for various different projects.
I think good candidates would be documentation examples; you might try ConsPortfolioModelDoc, for example.
Notes from the sprint meeting this morning:
I'll work on a demo of ways to work with this after ofter some other directory cleanup (i.e. #440 )
I've been doing research into how other scientific simulation libraries handle this problem. What I've found is that there is no standardized way of doing it yet, but there's some patterns that seem to hold across the libraries.
These are the libraries I included in my survey, with some notes about how each managed parameters in their examples.
Systems Biology Modeling http://pysb.org/ --- Parameters coded into examples. No visible inheritance.
Agent Based Modeling https://pypi.org/project/Mesa/ https://forum.comses.net/t/mesa-an-agent-based-modeling-framework-in-python-3/7039
--- Individual examples have their own requirements.txt --- models have default parameters in initmethod of the model class
Metropolitan Travel Activity https://activitysim.github.io/
-- Many, many configuration options -- Everything provided in .csv or .yaml files, e.g.: https://github.com/ActivitySim/activitysim/tree/master/example/configs
Dynamic systems https://github.com/simupy/simupy https://readthedocs.org/projects/simupy-personal/downloads/pdf/latest/ --- Parameters coded into each example file --- No reuse -- library is immature
Brain simulations https://www.frontiersin.org/articles/10.3389/fninf.2013.00048/full --- examples are all notebooks in the docs directory https://github.com/nengo/nengo/tree/master/docs --- parameters are all just entered as arguments in (very lightweight) modeling interface, e.g.: https://github.com/s72sue/std_neural_nets/blob/master/hopfield_network.ipynb
Neuro imaging http://nilearn.github.io/auto_examples/index.html#tutorial-examples --- datasets are loaded by a data loading handler --- many examples, with few parameters, which are hardcoded as method arguments E.g. https://github.com/nilearn/nilearn/blob/master/examples/04_manipulating_images/plot_roi_extraction.py
Special mention:
Plant simulations https://academic.oup.com/insilicoplants/article/1/1/diz001/5479575 https://github.com/cropsinsilico/yggdrasil Software for combining models across programming languages to accommodate different layers of abstraction.
Sounds like you did an admirably comprehensive job at looking at other libraries.
I guess your next step should be to propose lessons, in the form of suggestions for how we should do this systematically?
On Fri, Dec 13, 2019 at 4:53 PM Sebastian Benthall notifications@github.com wrote:
I've been doing research into how other scientific simulation libraries handle this problem. What I've found is that there is no standardized way of doing it yet, but there's some patterns that seem to hold across the libraries.
- None of these libraries has anything as tightly integrated as REMARKs currently are for publications. The examples provided with the core libraries vary in how 'complete' they are as useful demos or exploratory tool, but I don't see any submodules or linking across repositories.
- There are almost never parameters hard-coded into the library itself. Most of the time, these are coded into the python of an example notebook or python file on a case-by-case basis. There are a couple exceptions to this:
- ActivitySim has many, many parameters for its simulations; it stores these in .yaml and .csv files
- Mesa has substantive model classes that are initialized at the start of particular simulations or experiments. These have their default parameters loaded as default arguments to the class initializer and sometimes stored in static variables of the class itself.
These are the libraries I included in my survey, with some notes about how each managed parameters in their examples. PySB
Systems Biology Modeling http://pysb.org/ --- Parameters coded into examples. No visible inheritance. Mesa
Agent Based Modeling https://pypi.org/project/Mesa/
https://forum.comses.net/t/mesa-an-agent-based-modeling-framework-in-python-3/7039
--- Individual examples have their own requirements.txt --- models have default parameters in initmethod of the model class ActivitySim
Metropolitan Travel Activity https://activitysim.github.io/
-- Many, many configuration options -- Everything provided in .csv or .yaml files, e.g.: https://github.com/ActivitySim/activitysim/tree/master/example/configs SimuPy
Dynamic systems https://github.com/simupy/simupy https://readthedocs.org/projects/simupy-personal/downloads/pdf/latest/ --- Parameters coded into each example file --- No reuse -- library is immature Nengo
Brain simulations https://www.frontiersin.org/articles/10.3389/fninf.2013.00048/full --- examples are all notebooks in the docs directory https://github.com/nengo/nengo/tree/master/docs --- parameters are all just entered as arguments in (very lightweight) modeling interface, e.g.:
https://github.com/s72sue/std_neural_nets/blob/master/hopfield_network.ipynb nilearn
Neuro imaging http://nilearn.github.io/auto_examples/index.html#tutorial-examples --- datasets are loaded by a data loading handler --- many examples, with few parameters, which are hardcoded as method arguments E.g.
Special mention: Yggdrasil
Plant simulations https://academic.oup.com/insilicoplants/article/1/1/diz001/5479575 https://github.com/cropsinsilico/yggdrasil Software for combining models across programming languages to accommodate different layers of abstraction.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/econ-ark/HARK/issues/446?email_source=notifications&email_token=AAKCK75H243SCRV5PQUX3JTQYP75JA5CNFSM4JV6MI72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG3LLKY#issuecomment-565622187, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCK74K2ZNUSHF22ZSJAQLQYP75JANCNFSM4JV6MI7Q .
--
The main lessons from this work are:
I will make a PR with a demonstration of how this could work with the HARK core and a template example.
One thing that occurred to me after I did the survey of simulation libraries, but which I think is important, is this:
scikit-learn
and PyMC3
.This may be a more complex issue, better dealt with in a separate task. But I wanted to flag for future work the possibility of:
CRRA
and DiscFac
), and parameters that guide how it works operationally, perhaps like CubicBool
.I've noticed that Dolo configuration files do separate meaningful categories of parameters from each other, which I think helps add clarity.
So, back to business after a long teaching span. Indeed in dolo the choice was made to completely separate the model part which is in the yaml part from solution instructiond which are in the python code. However the separation is not that strict and one of the reasons there are no command options in the yaml file is is their lack so far of api stability. I want the yaml filed to stay. A certain degree is separation is probably a good idea. I'd suggest to check the toml language. It looks nice and simple. I didn't adopt it because it isn't great to input equations (you need quoted everywhere)
On Sun, Dec 15, 2019, 9:15 PM Sebastian Benthall notifications@github.com wrote:
The main lessons from this work are:
- If there are a large number of parameters, it makes sense to put them in a serial configuration file, like a .yaml
- If there are substantive models, it makes sense for default parameter values to be loaded by the model's class when it initializes.
I will make a PR with a demonstration of how this could work with the HARK core and a template example.
One thing that occurred to me after I did the survey of simulation libraries, but which I think is important, is this:
- The libraries I looked at are mainly about defining a model's content by giving it parameters, and delivering simulated output.
- Some HARK parameters are actually more about how the model is executed, which is quite a bit different from model content. Maybe this should be treated differently. This would imply a comparison with a different set of Python libaries that emphasis model-fitting more, such as scikit-learn and PyMC3.
This may be a more complex issue, better dealt with in a separate task. But I wanted to flag for future work the possibility of:
- Distinguishing, when defining parameters, between those that are for the model's substantive content (like CRRA and DiscFac), and parameters that guide how it works operationally, perhaps like CubicBool.
I've noticed that Dolo configuration files do separate meaningful categories of parameters from each other, which I think helps add clarity.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/econ-ark/HARK/issues/446?email_source=notifications&email_token=AACDSKMZBPXUSPI7MUKEAE3QY2F5RA5CNFSM4JV6MI72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG5BGDI#issuecomment-565842701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACDSKNV25KX3BM2TOPIUUTQY2F5RANCNFSM4JV6MI7Q .
(Replaced this comment with markdown version below)
Pablo,
So, your suggestion would be that we adopt toml as our standard for defining input files of all kinds (esp. parameter files)?
For example, right now we have a giant ConsumerParameters.py file which we use with commands like
from ConsumerParameters import init_perf_foresight as PerfForesightDict
PFexample = PerfForesightConsumerType(**PerfForesightDict)
(and the relevant part of ConsumerParameters.py is excerpted below).
We’ve had some discussions about alternative ways of doing this including:
One complexity is that we build our models by inheritance, so that for example ConsIndShockType model inherits the characteristics of PerfForesightType, and we would want it to inherit default parameter values too, which would argue for options 1. or 3. above.
My inclination is for 3, but am curious about your thoughts.
ConsumerParameters.py:
CRRA = 2.0 # Coefficient of relative risk aversion
Rfree = 1.03 # Interest factor on assets
DiscFac = 0.96 # Intertemporal discount factor
LivPrb = [0.98] # Survival probability
PermGroFac = [1.01] # Permanent income growth factor
BoroCnstArt = None # Artificial borrowing constraint
MaxKinks = 400 # Maximum number of grid points to
allow in cFunc (should be large)
AgentCount = 10000 # Number of agents of this type
(only matters for simulation)
aNrmInitMean = 0.0 # Mean of log initial assets (only
matters for simulation)
aNrmInitStd = 1.0 # Standard deviation of log
initial assets (only for simulation)
pLvlInitMean = 0.0 # Mean of log initial permanent
income (only matters for simulation)
pLvlInitStd = 0.0 # Standard deviation of log
initial permanent income (only matters for simulation)
PermGroFacAgg = 1.0 # Aggregate permanent income
growth factor (only matters for simulation)
T_age = None # Age after which simulated agents
are automatically killed
T_cycle = 1 # Number of periods in the cycle
for this agent type
# Make a dictionary to specify a perfect foresight consumer type
init_perfect_foresight = { 'CRRA': CRRA,
'Rfree': Rfree,
'DiscFac': DiscFac,
'LivPrb': LivPrb,
'PermGroFac': PermGroFac,
'BoroCnstArt': BoroCnstArt,
#'MaxKinks': MaxKinks,
'AgentCount': AgentCount,
'aNrmInitMean' : aNrmInitMean,
'aNrmInitStd' : aNrmInitStd,
'pLvlInitMean' : pLvlInitMean,
'pLvlInitStd' : pLvlInitStd,
'PermGroFacAgg' : PermGroFacAgg,
'T_age' : T_age,
'T_cycle' : T_cycle
}
In master
, the library's default parameters are all hard-coded into ConsumerParameters.py
.
In this PR, ConsumerParameters.py
is still there, but when it is imported it loads all the parameters from a ConsumerParameters.yaml
file.
With the exception of a few small changes, this PR could in principle be merged with no change to the API for downstream uses.
@sbenthall, sounds like you've made a nice prototype (though I haven't had time to look at it yet).
I'd be interested in your thinking about the pros and cons of my idea from the prior discussion, of having default values embedded in the definition of the class, then written out to a yaml file. As I see it:
pro: There's one place to look both for how the parameter is used and what its default numerical value is con: The values of the parameters are scattered through the class definition instead of concentrated in one place
PS. Did you look into why Pablo suggested toml instead of yaml?
@llorracc Ok, I'll be honest.
I don't like the idea of having the classes write the parameters out to a yaml file, with that yaml file stored in the version control, for your "countercounterpoint" reason. I think it's confusing.
I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.
These parameters might even be displayed in the __repr__
of the class.
https://www.pythonforbeginners.com/basics/__str__-vs-__repr
I think these conversations are very tricky because they often depend on quite unscientific intuitions about what's "easier to use", which is a very noisy human variable. Most of the time when I have an opinion on this, it's based on my understanding of software engineering conventions. But there's always room to disagree.
I've now looked at TOML, as Pablo recommends. It looks quite similar to YAML. I think it's less widely used than YAML. My impression is that it would be idiosyncratic to adopt it. If it isn't as good as YAML for including equations, I think that's a dealbreaker for depending on it in the long run.
Thinking through my feelings on this, I guess partly they boil down to an
aversion to a proliferation of different files that people have to get
loaded correctly and in the right locations. This may well be a bad
instinct on my part, to the extent that it is not conditioned on experience
with people getting everything via a pip install
or whatever. My
fondness for embedding the default parameter values in the definition of
the class is that tif we do it that way then hen whenever the person has a
definition of the class they are guaranteed to have a definition of
default parameter values. Our current setup requires them also to have
ConsumerParameters.py in the right place, and your extension requires them
_als_o to have a yaml file in the right place. To the extent that worrying
about people having "the right files in the right place" is anachronistic
on my part (because pip install
or git pull
will guarantee that, I'm on
board with your approach.
The one other point in favor of the 'defaults within the class definition' approach is inheritance. Suppose we want ConsIndShockType to inherit all the default parameters of PerfForesightCRRAType, and only be required to specify values for values that are NOT default parameters for PerfForesightConsumerType. I don't see how we do that with your setup of standalone YAML files for each type, whereas it is inherent in my approach of classes inheriting from parent classes and ony having to define the variables that are novel.
I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.
This sounds good; but to be clear, you are proposing a new standard here, which is not yet implemented for any of our existing classes?
On Mon, Dec 23, 2019 at 7:23 PM Sebastian Benthall notifications@github.com wrote:
@llorracc https://github.com/llorracc Ok, I'll be honest.
I don't like the idea of having the classes write the parameters out to a yaml file, with that yaml file stored in the version control, for your "countercounterpoint" reason. I think it's confusing.
I think it would accomplish the same thing, but be less confusing, if each class instance had a method that clearly reported to the user what its parameters are.
These parameters might even be displayed in the repr of the class. https://www.pythonforbeginners.com/basics/__str__-vs-__repr
I think these conversations are very tricky because they often depend on quite unscientific intuitions about what's "easier to use", which is a very noisy human variable. Most of the time when I have an opinion on this, it's based on my understanding of software engineering conventions. But there's always room to disagree.
I've now looked at TOML, as Pablo recommends. It looks quite similar to YAML. I think it's less widely used than YAML. My impression is that it would be idiosyncratic to adopt it. If it isn't as good as YAML for including equations, I think that's a dealbreaker for depending on it in the long run.
https://gist.github.com/oconnor663/9aeb4ed56394cb013a20
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/econ-ark/HARK/issues/446?email_source=notifications&email_token=AAKCK7ZU7QYDVHGY2HUFJIDQ2FI7DA5CNFSM4JV6MI72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHSFNSA#issuecomment-568612552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCK7ZKTXROVBVS53X74RDQ2FI7DANCNFSM4JV6MI7Q .
--
As a small step: Standardize the names of the parameter dictionaries in ConsumerParameters.py
with the names of the classes that use them. Then fix the downstream parameters.
Referring to that proposed "small step"--a problem with naming the dictionaries in ConsumerParameters.py
with the class that ingests them is that many of these dictionaries are reused in multiple places at the moment.
For example, here, init_idiosyncratic_shock
is imported, updated, and then given at runtime as arguments to the initializer of RepAgentConsumerType
:
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsRepAgentModel.py#L340-L350
This is a good example of why it would be better for classes to have default parameters that get inherited by their subclasses. Indeed, RepAgentConsumerType
is a subclass of IndShockConsumerType
and if the variables were being passed through by inheritance, then only the changes values would need to be defined at runtime.
For this reason, I'm working on #466, which gives each class the parameters as overrideable defaults.
With #442 merged, now it's easier to see why the current way of handling parameters is problematic.
Because of the old way of handling parameters, there are downstream dependencies on a parameter file that shouldn't be in the HARK module: https://github.com/econ-ark/HARK/issues/440#issuecomment-562360408 https://github.com/econ-ark/HARK/blob/master/HARK/SolvingMicroDSOPs/Calibration/EstimationParameters.py
SolvingMicroDSOPs
is now a REMARK. But it is still depending on HARK for its Calibration file. This is not right.
Whatever solution we find for parameter management within HARK will, in the best case, also inform how REMARKs work as well.
I think at yesterday's meeting we came to some conclusions about where to go with this. It's actually several different features, which will allow for efficient and flexible configuration.
For (c) and (d) there's questions about how specifically the YAML will be formatted. But I think it's fair to say that if a value is not specified in an (input) YAML file, it will be filled with the default value.
The idea behind (c) and (d) is to have model portability. This is quite a big lift. I'd like to scope this ticket at (a) (b) and the preliminary version of (c). The design decisions for (c) and (d) are going to require a lot more discussion.
The next step in this issue is to allow the configuration of a HARK model from a YAML file.
The best thing to do would be to use an existing YAML format for model configuration: Dolang! So this is related to #763
Since #763 covers the case of having dolang YAML input into HARK, I'm changing the scope of this ticket to be outputting HARK modles to dolang YAML.
This will depend on having an internal, functionalized version of the transition functions in HARK. So this depends on #761
This also depends on having an organized representation of the parameters of a model, or #660
Simulations built with HARK currently start with long sections of Python code setting parameters.
This custom code is written in a few different styles; it would be cleaner to have these parameters in a configuration file. This is what Dolang does, and is a pattern it would do well to get to.
This also would be a step in the direction of automating the creation of tables for showing the mapping between notation and Python variables.