ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
190 stars 99 forks source link

Revisit Trip Destination Alternatives Preprocessor #868

Closed jpn-- closed 4 months ago

jpn-- commented 4 months ago

Describe the bug The alternatives pre-processor in #865 is not working correctly. It looks like presampling is aggregating the values as if they are a size term.

To Reproduce

Expected behavior d_microAccTime should take on value of only 0, 5, or 999. But it is getting values that are sometimes a combination of these (e.g. 2997, 2003, 1009, etc).

dhensle commented 4 months ago

I think the issue is that the original expression itself was not correct.

The original implementation found the max MicroAccessTime by TAZ and then reindexed on the alternatives. However, the groupby in the original expression set the index as TAZ where as the alternatives had a destination index of maz. (This was confusing in the original spec because the trip destination option DEST_COLUMN_NAME was set to dest_taz.)

I do not see any reason why we need to actually do the aggregation to TAZ. We can just join directly by MAZ, which is indeed what is done for the origin MicroAccessTime. (I imagine it was implemented like this in the first place due to the above stated dest_taz mislabel.) @JoeJimFlood am I missing something here?

I have updated the PR into the sandag abm3 model with corrections to these config files: https://github.com/ActivitySim/sandag-abm3-example/pull/13/commits/048f8d981565054ea611570d3439f95d515b92cc. This will change the results slightly because of the fixed expression.

Also, upon further inspection, I do not see any reason why we would want to keep the size term columns in the alternatives table. I have reverted that change, as seen here https://github.com/dhensle/activitysim/commit/17826edb7741787d4e5a0d28f7712c3b39170133 and opened https://github.com/ActivitySim/activitysim/pull/869.

jpn-- commented 4 months ago

I think this is trickier than it at first appears. The original d_MicroAccessTime expression was embedded in the sampling spec, which (because of presampling) operates on TAZs not MAZs. So, I think it was clunky but maybe mostly correct where it was. But the alternatives preprocessor operates before the presampling, so you are correct, it does work on MAZs, and as written in the example it didn't work right.

For sampling with presampling, the MAZ alternatives get aggregated to TAZs here:

https://github.com/ActivitySim/activitysim/blob/1cb48c7acc2c377f4822e2521a19503842aaa495/activitysim/abm/models/trip_destination.py#L559-L563

... which as noted happens after the alternatives preprocessor. This is how we're getting nonconforming d_MicroAccessTime values.

As the comment in the code says, there is (or, was) no data in the thing getting rolled up, just the indexes. But now there is. If there's no data, it doesn't matter what kind of aggregation is done, whomever wrote this line many years ago just put "sum" and it was fine. But now that there may be data appearing, we will need to have some mechanism to allow the user to identify what kind of aggregation to use for each relevant variable (e.g. sum, min, max, etc).

At this point, it might be worth reconsidering #862, which is much more limited in functionality than this PR, but does automatically switch between land_use or land_use_taz based on pre-sampling, and allows (requires) the user to figure out whatever aggregation is needed in other annotation components, possibly custom ones, that happen before and separate from trip_destination.

dhensle commented 4 months ago

Ahh, I see. I missed the aggregation, as you point out. I think the simple fix here is to just have separate preprocessors for the sample part and the simulate part. I have pushed updates to this effect to both https://github.com/ActivitySim/sandag-abm3-example/pull/13 and https://github.com/ActivitySim/activitysim/pull/869

Let me know what you think.