cenpy-devs / cenpy

Explore and download data from Census APIs
Other
184 stars 44 forks source link

Typo in TigerConnection query giving TypeError #98

Open ajkluber opened 4 years ago

ajkluber commented 4 years ago

Thanks for writing an awesome package! I have been getting to know it better the last couple weeks and think it will be very useful for my own projects. At the moment, I want to use your TigerConnection to get census tract GeoDataFrame using longitude and latitude following this example. In the process I think I found a little bug.

In particular, passing a string for layer to .query on TigerConnection gives a TypeError. Here is a short code snippet, with a more complete example below.

tiger = cenpy.tiger.TigerConnection('tigerWMS_ACS2015')
res1 = tiger.query(**q, layer=8)    # this one works
res2 = tiger.query(**q, layer='(ESRILayer) Census Tracts')    # TypeError

I think this can be fixed by changing .index in this call to _fuzzy_match to .name, which is how other calls to _fuzzy_match use the return value. https://github.com/cenpy-devs/cenpy/blob/3d76a09de02d7fa9efff0c03f58af3429e2ce15a/cenpy/tiger.py#L267-L269

From what I tell, this doesn't impact any of the main features of the higher up products because .query is never called on TigerConnection, only on ESRILayer. Maybe this is why it hasn't shown up before.

Code to reproduce error


import cenpy

print(cenpy.__version__)

q = {'inSR': '4326', 
     'geometry': '-87.98543926966337%2C43.09830338412488', 
     'returnGeometry': 'true', 
     'geometryType': 'esriGeometryPoint',
     'spatialRel': 'esriSpatialRelIntersects',
     'outFields': 'STATE,COUNTY,TRACT,GEOID',
     'returnTrueCurves': 'false',
     'returnIdsOnly': 'false',
     'returnCountOnly': 'false',
     'returnZ': 'false',
     'returnM': 'false',
     'returnDistinctValues': 'false',
     'returnExtentsOnly': 'false',
     'f': 'json'
    }

tiger = cenpy.tiger.TigerConnection('tigerWMS_ACS2015')

# specify layer with int
res1 = tiger.query(**q, layer=8)
print(type(res1))

# specify layer with string
print(cenpy.products._fuzzy_match('(ESRILayer) Census Tracts', [ f.__repr__() for f in tiger.layers ]).index)
print(cenpy.products._fuzzy_match('(ESRILayer) Census Tracts', [ f.__repr__() for f in tiger.layers ]).name)
res2 = tiger.query(**q, layer='(ESRILayer) Census Tracts')

Gives the following output...

1.0.0post2
<class 'geopandas.geodataframe.GeoDataFrame'>
Index(['target', 'score', 'score2'], dtype='object')
8
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-56-dd881d9cbda5> in <module>
     28 print(cenpy.products._fuzzy_match('(ESRILayer) Census Tracts', [ f.__repr__() for f in tiger.layers ]).index)
     29 print(cenpy.products._fuzzy_match('(ESRILayer) Census Tracts', [ f.__repr__() for f in tiger.layers ]).name)
---> 30 res2 = tiger.query(**q, layer='(ESRILayer) Census Tracts')

~\AppData\Local\Continuum\anaconda3\envs\cenpy\lib\site-packages\cenpy\tiger.py in query(self, **kwargs)
    255         if layer_result is None:
    256             raise Exception('No layer selected.')
--> 257         return self.layers[layer_result].query(**kwargs)

TypeError: list indices must be integers or slices, not Index
dfolch commented 3 years ago

I can confirm this bug report. Here is another example:

cxn = cenpy.tiger.TigerConnection('PUMA_TAD_TAZ_UGA_ZCTA')
az_pumas = cxn.query(layer='(ESRILayer) Public Use Microdata Areas',  where='STATE = 04')

Output

TypeError: list indices must be integers or slices, not Index

The solution proposed by @ajkluber (to replace .index with .name) at line 269 works on my use case. Not sure if this is the general solution though.


Workaround for the time being:

cxn = cenpy.tiger.TigerConnection('PUMA_TAD_TAZ_UGA_ZCTA')
layer = '(ESRILayer) Public Use Microdata Areas'
index = [f.__repr__() for f in cxn.layers].index(layer)
az_pumas = cxn.layers[index].query(where='STATE = 04')