ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
189 stars 96 forks source link

"other than modeled person" fomulation #22

Closed fscottfoti closed 9 years ago

fscottfoti commented 9 years ago

@jiffyclub I think this one is for you. I'm working my way through hundreds of variable specifications, many of which are straightforward. I now see about 20 that all have the same formulation that has me a bit stumped. Something along the lines of a [retiree] in the household that is NOT the current person. What do you think is the best way (pandas and/or simulation framework) to formulate a variable like this from the perspective of the chooser?

jiffyclub commented 9 years ago

That is a tough one. If you had a list of the person types for the household that didn't include the current person you could a "retired_type in household". Or if you had a Series mapping from person ID to person type for the whole household you could do "retired_type in household.drop(current_id)".

fscottfoti commented 9 years ago

Right - so this is what I'm doing without worrying about the perspective of the person

def presence_of(ptype, persons, households):
    return (persons.household_id[persons.ptype_cat == ptype].\
        value_counts() > 0).reindex(households.index).fillna(False)

And I could also pass in a choosers Series giving the index of the persons I care about. And I could drop the people in that series from the persons table and do the above. But that would still be the wrong answer because I would want copies of the household for the people where more than one person is from the same household and the variable might be different from the perspective of each person.

I think it might be an involved answer unfortunately and we'll just have to take the performance hit to do the copies, reindexes, and appropriate drops. Might want to put on the back burner for now and come back to it unless you see a quicker answer.

jiffyclub commented 9 years ago

It's something you have to do household-by-household and person-by-person unless you've pre-loaded the persons table with information about the rest of the household.

fscottfoti commented 9 years ago

Per our discussion yesterday, I realized I already have a single function which definitely only uses person type as the relevant variable.

https://github.com/synthicity/activitysim/blob/254da40fe1acd175819533ffcb528a6e5d72053b/activitysim/defaults/tables/households.py#L166

The current version checks for the presence of that type including the chooser. The intended version is meant to check for presence of that type excluding the chooser. So like you said, one way to do this would be to create a list of person types excluding the chooser for each person. Alternatively, it's a bit sneaky, but I wonder if an even easier way is to look for 2 or greater of that person type in the household if the person is that type, or one of greater if the person is not the person type.

jiffyclub commented 9 years ago

Okay, I came up with this, but I haven't actually tried it yet:

def presence_of(ptype, persons, households, at_home=False):
    if at_home:
        # if at_home, they need to be of given type AND at home
        s = persons.household_id[(persons.ptype_cat == ptype) &
                                 (persons.cdap_activity == "H")]
    else:
        s = persons.household_id[persons.ptype_cat == ptype]

    counts = s.value_counts()
    sf = s.to_frame(name='right')
    sf['household_id'] = persons['household_id']
    gt0 = (counts > 0).to_frame(name='left').merge(
        sf, how='right', left_index=True, right_on='household_id')
    gt1 = (counts > 1).to_frame(name='left').merge(
        sf, how='right', left_index=True, right_on='household_id')

    return gt1['left'].where(s, other=gt0['left'])

This is different than the current presence_of because it puts things on the person level instead of households. You can only put "other than" info on the person table.

fscottfoti commented 9 years ago

That's some real Pandas-fu. Should probably test it first but that looks great.

jiffyclub commented 9 years ago

One question is how general this needs to be? With the at_home parameter and hard-coded column names this is specific to the person-type variables.

fscottfoti commented 9 years ago

I don't know. So far it's the only one I've encountered that looks like this, though there could be others. If it's easy to generalize, that would be good. If it's not easy, it might not be worth it yet.