ActivitySim / populationsim

An Open Platform for Population Synthesis
https://activitysim.github.io/populationsim
Other
52 stars 37 forks source link

Recommended way to configure model year? #60

Open lmz opened 6 years ago

lmz commented 6 years ago

Hi there - I've been playing with PopulationSim for our use (https://github.com/BayAreaMetro/populationsim) and so far, I'm testing it for 2010 by specifying the 2010 control files (for example, https://github.com/BayAreaMetro/populationsim/blob/master/bay_area/households/configs/settings.yaml#L61)

But our standard practice will be to run it for multiple years but I don't like the obvious solutions: 1) having duplicates of the config that looks very similar with just the year changed 2) having a single config but copying/moving files around at runtime

I'd prefer something like having the settings.yaml file have something like

model_year: 2010

and then

  - tablename: MAZ_control_data
    filename : %model_year%_mazData.csv

How do you recommend folks handle this? Thank you!

toliwaga commented 6 years ago

I agree that this would be a handy feature. It probably makes sense to use anchors and tags to implement this feature.

Fortunately you don't have to wait. You can define a join method and register it as a tag handler with yaml globally, and it will be available to you in all your yaml files. Try the following:

Add this to the import section at the top of run_populationsim.py

import yaml

Put the following before any other executable code in run_populationsim.py (i.e. before the handle_standard_args() call) to install the yaml tag handler

## define custom tag handler
def join(loader, node):
    seq = loader.construct_sequence(node)
    return ''.join([str(i) for i in seq])

yaml.add_constructor('!join', join)

Now, in settings.py, you can define an anchor for model_year, and use the !join tag to concatenate it into other strings:

current_model_year: &MODEL_YEAR '2010'

TEST_JOIN:
  - tablename: !join [*MODEL_YEAR, _mazData.csv]
    other_stuff: stuff

You can test this by doing this at somewhere near the beginning of run_populationsim.py

print "TEST_JOIN:", setting('TEST_JOIN')
exit()

And it should print out:

TEST_JOIN: [{'tablename': '2010_mazData.csv', 'other_stuff': 'stuff'}]

You could avoid defining a custom tag by simply doing something like:

current_model_year: &MODEL_YEAR '2010'

TEST_JOIN:
  - tablename: !!python/object/apply:string.join [[*MODEL_YEAR, _mazData], '']
    other_stuff: stuff

but this introduces yet another python3 compatibility issue. Plus it isn't very readable.

bettinardi commented 6 years ago

Thanks for the quick feedback Jeff. I also wanted to mention that ODOT will need to think about this question as well. Our next step in the contract is to implement PopulationSim in our Statewide Integrated Model (SWIM) which runs a synthetic population each year for ~30 years (2010-2040) in an automated way into the future. So ODOT could think about this as a a more flexible run structure with the inputs and output folders being specified more flexibility and dynamically.

If we work out a flexible "data" and "outputs" location and naming process. Then one should be able to setup a yaml that works through a series of changing input and output locations with just one config file...

toliwaga commented 6 years ago

Yes - the next contract step will be a good opportunity to come up with a standard way of dealing with this common situation.