UDST / bayarea_urbansim

UrbanSim implementation for the San Francisco Bay Area
14 stars 26 forks source link

UAL extensions to add units, tenure, conditional household relocation #79

Closed smmaurer closed 7 years ago

smmaurer commented 8 years ago

I'm opening this pull request so we can discuss the first batch of UAL extensions to bayarea_urbansim. These are described in the README here: https://github.com/ual/bayarea_urbansim/tree/ual-development

smmaurer commented 8 years ago

These were Fletcher's comments by email on July 8:


Yeah this looks great (sorry it took me so long to take a look). I love the ability to modularize and turn it on and off to see all of the impacts - I was hoping it would work like that!

3 quick comments (or you can open up a PR?):

Looks like there's an issue with tabs v spaces (did you see the recent Silicon Valley on this?) - some of it looks funny on github - might be worth a pep8

I might be being paranoid but I think pd.merge [https://github.com/ual/bayarea_urbansim/blob/ual-development/baus/ual.py#L296] doesn't return the index you're hoping for here. It might work in this case but it just returns a consecutive index (I guess this would always work right now but might not if the households index becomes non-consecutive)

You have an initialization step [https://github.com/ual/bayarea_urbansim/blob/ual-development/baus/ual.py#L100] where you initialize a few other orca things - can you explain to me why the initialization is necessary? Why not just initialize when the file is loaded like the rest of the orca things?

smmaurer commented 8 years ago

And here are my responses:

  1. Aha, I was wondering what was going on with the tabs and spaces.. Fixed a setting in my text editor and it's much better now.
  2. Oh dear, you're right. Testing it, pd.merge() was retaining the index in this case, but the documentation says it shouldn't. Maybe it works because it's an explicit left-join? But i fixed it to do this properly with reset_index() and set_index(), which makes the code clearer as well.
  3. Yeah, it's not necessary to do that within an orca step. But i wanted to be able to explicitly control which things were being registered in different simulation runs -- partly just as an organizational strategy to keep track of which registrations are associated with which functionality, but also to avoid the risk of accidental namespace clashes. I know there's no performance advantage, but is it crazy to do it this way?
fscottfoti commented 8 years ago

This all looks great Sam - really nice. As for 3, I lean towards just registering as per usual, I think if a user doesn't want to pollute their "orca space" than they can just not import. The orca environment is global so it's not like you can control a set of models for one run and a different set for another, so import or not seems like an ok way to control the env. Anyway, this isn't a strong preference, just a slight one.

mkreilly commented 7 years ago

Hi @smmaurer it's finally time to get moving on this now that our plan work seems to be winding down. should we start by looking at this pull request? or is their more recent work? thanks

smmaurer commented 7 years ago

Hi, that's great! Yes, this is the best place to start, comparing our ual-development branch to the master. I don't think we've done any direct work on this since August.

Our branch adds representations of units and tenure plus some new model steps, but the approach we used is a bit different from the rest of the codebase. We wanted to be able to compare versions of the model that use different data schemas, so units and tenure are NOT in datasources.py or variables.py. Instead, we set up sequences of "initialization steps" that can be run -- or not -- at the beginning of a simulation to set up the necessary tables and columns. All our substantive code is in a new file called ual.py.

Along the same lines, we tried to explicitly document the data characteristics that are expected (or produced) by each model step. This makes it easier to verify that particular sequences of steps are internally consistent.

So, these differences will make it harder to merge our branch into the master, but hopefully will make it easier in the long run to work with this sort of variation. I'm sure we've done a certain amount of reinventing the wheel, so I'd love to chat at some point about how you guys have dealt with this.

fscottfoti commented 7 years ago

In terms of data I need to run this changes, I guess I need the unit table? Does that join to 2015_09_01_bayarea_v3.h5? Where does the unit_id on households come from? Is there anything else?

fscottfoti commented 7 years ago

It looks like you have lots of interesting household variables too - do you have a whole new household table?

smmaurer commented 7 years ago

Ok, I tracked down the data requirements, sorry for the delay! I think all of it is from you folks, actually.

The units table is generated dynamically at initialization, from information in the buildings table. And a unit_id is assigned to households at the same time. All the household variables we're using in the model are from the standard synthetic population (as far as I can recall), but there might be references to regressions we ran separately using other PUMS variables.

Here's an annotated setup script that lists the required libraries and data files: ual_baus_install_template.sh

fscottfoti commented 7 years ago

closing in favor of #85. I needed to work in a branch of UDST, so that PR includes all these commits and a few others to fix conflicts and the like