ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
189 stars 96 forks source link

Misleading that Skims object setters and getters are not symmetrical #85

Closed toliwaga closed 7 years ago

toliwaga commented 8 years ago

it is very misleading, confusing, and error prone that the Skims object __getitem__ and __setitem__ methods are NOT symmetrical.

__setitem__(self, key, value) adds a Skim object to the hash of skims

def __getitem__(self, key) does NOT return the corresponding Skim object, but calls lookup using the implicit left_key and right_key df values and returns a Series

The fact that __getitem__ is a synonym for lookup at first view appears convenient for the readability spec expressions where skims['DISTANCE'] returns NOT the Skim object but the contextualized skim lookup, but this apparent gain in readability comes at the expense of concealing what is really going on behind the scenes.

This is especially error prone as both Skim and pandas Series support two-arg get() methods, so that a confusion over whether one is dealing with a Skim or a Series is potentially difficult to detect. The dangers of this asymmetry is underlined (sic) by the fact that the skims.py injectables distance_skim, sovam_skim, etc made this mistake for a while without its being detected, and this code was apparently written by the authors of the Skims class.

toliwaga commented 8 years ago

The way things stand, you can write expressions in the csv spec files that access skims like this (from trip_mode_choice.csv):

Walk - Time up to 2 miles,"@skims['DISTWALK'].clip(upper=walkThresh)*2*60/walkSpeed"

The skims object is a wrapper for Skim objects, but the expression skims['DISTWALK'] isn't returning a Skim object. Instead, it is returning a series evaluated in the context of the choosers dataframe with the walk distance for the chooser row based on the orig key from the 'TAZ' column and the dest key from the 'destination' column. the Skims object knows to use those keys because mode_choice called skims.set_keys('TAZ','destination) and the Skims.__getitem__(key)adds a call to lookup after retrieving the skim:

def __getitem__(self, key):
   return self.lookup(self.skims[key])

Otherwise, the spec file would ahve to do something like

Walk - Time up to 2 miles,"@skims.lookup('DISTWALK').clip(upper=walkThresh)*2*60/walkSpeed"

Apparently, it was deemed more natural to use the square bracket notation in this context, but this comes at the expense of python code in activitysim having to remember that the square bracket operator in python doesn't do what you might expect, when it appears on the RHS (right hand side) of an expression.

In python, to get the skim out of the dict that the Skims object whats, you have to say:

skim = skims.get_skim('DISTANCE')

even though the skim was assigned like this:

skims['DISTANCE'] = skim

It could be argued that the putative benefit of being able to say skims['DISTWALK'] in the csv flies is no great enough, compared to saying skims.lookup('DISTWALK') and the confusion, as evidenced by the recent bug in skims.py where the distance_skim injectable erroneously used the sqaure brackets attempting to access the wrapped 'DISTANCE' skim. Even worse, that erroneous usage didn't throw an exception because Skim and Series objects have some functions with identical calling signatures.

So it this mistake is made again, we might not discover it immediately.

bstabler commented 7 years ago

This is related to the work going on here

toliwaga commented 7 years ago

made the setter explicit to remove ambiguity. SkimStackWrapper and SkimStackWrapper now implement getitem to permit skim dereferencing in expression file (e.g. skims['DISTWALK'])

We may want to replace implicit df column refs, but for now this ambiguity is eliminated and the way it works is (I think) a little more straightforward.