cenpy-devs / cenpy

Explore and download data from Census APIs
Other
184 stars 44 forks source link

Is it possible to get county level with ACS from_state? #90

Open MaxGhenis opened 4 years ago

MaxGhenis commented 4 years ago

Looking at the code it seems like it should be, but I get the following error:

acs.from_state('CA', level='county')

AttributeError Traceback (most recent call last)

in ----> 1 acs.from_state('CA', level='county') ~/miniconda3/lib/python3.7/site-packages/cenpy/products.py in from_state(self, state, variables, level, **kwargs) 631 .replace('place', 'county') 632 def from_state(self, state, variables=None, level='tract', **kwargs): --> 633 return self._from_name(state, variables, level, 'States', **kwargs) 634 from_state.__doc__ = _Product\ 635 .from_place.__doc__\ ~/miniconda3/lib/python3.7/site-packages/cenpy/products.py in _from_name(self, place, variables, level, layername, return_geometry, cache_name, strict_within, return_bounds, geometry_precision) 605 strict_within=strict_within, 606 return_bounds=return_bounds, --> 607 geometry_precision=geometry_precision) 608 variables['GEOID'] = variables.GEO_ID.str.split('US').apply(lambda x: x[1]) 609 return_table = geoms[['GEOID', 'geometry']]\ ~/miniconda3/lib/python3.7/site-packages/cenpy/products.py in _from_name(self, place, variables, level, layername, strict_within, return_bounds, geometry_precision, cache_name, replace_missing, return_geometry) 312 variables=variables, level=level, 313 strict_within=False, return_bounds=False, --> 314 replace_missing=replace_missing) 315 if strict_within: 316 geoms = geopandas.sjoin(geoms, env[['geometry']], ~/miniconda3/lib/python3.7/site-packages/cenpy/products.py in _from_bbox(self, bounding_box, variables, level, return_geometry, geometry_precision, strict_within, return_bounds, replace_missing) 234 state, county = ix 235 if level in ('county','state'): --> 236 elements = chunk.COUNTY.unique() 237 else: 238 elements = chunk.TRACT.unique() ~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name) 5177 if self._info_axis._can_hold_identifiers_and_holds_name(name): 5178 return self[name] -> 5179 return object.__getattribute__(self, name) 5180 5181 def __setattr__(self, name, value): AttributeError: 'GeoDataFrame' object has no attribute 'COUNTY'
ljwolf commented 4 years ago

Hey, not currently. The way this is implemented is by manually defining the levels that work across the product API.

You should've hit a more informative error, but I can take a look as to why that's not happening. And, if that level can be validate, I'm happy to add it!

ljwolf commented 4 years ago

Further, I'll need to figure out why that doesn't work, given the fact that 'county' was explicitly enabled in the _layer_lookup. I get correct data with acs.from_state('AZ', level='county') for 2017. What year are you doing?

ljwolf commented 4 years ago

Confirmed an issue with 2018 changing the column identifiers:

import cenpy
acs2017 = cenpy.products.ACS(year=2017)
acs2018 = cenpy.products.ACS(year=2018)

acs2017.from_state('AZ', level='county') # returns the counties in AZ
acs2018.from_state('AZ', level='county') # fails on the error described
ronnie-llamado commented 3 years ago

It looks like this is due to differences between tigerWMS_ACS2017 and tigerWMS_ACS2018 map services coupled with a hard-coded _layer_lookup. Basically just querying the wrong layer from TIGER depending on the year.

Map Service Layer Indexes By Year

Map Service "Census Tract" Index "Counties" Index Link
tigerWMS_ACS2017 8 84* Link
tigerWMS_ACS2018 8 86* Link
tigerWMS_ACS2019 8 86 Link

Current Layer Lookup

cenpy.products.ACS is always checking layer at index 84 for counties, but that layer is now at index 86 for 2018 and 2019.

class ACS(_Product):
    """The American Community Survey (5-year vintages) from the Census Bueau"""

    _layer_lookup = {"county": 84, "tract": 8}

Proposed Solution

cenpy.products.ACS needs to become year-aware in order to account for the change in indexes.

Avoiding editing the base class to not interfere with Decennial2010 since it's year-aware by default.

class ACS(_Product):
    """The American Community Survey (5-year vintages) from the Census Bueau"""

    _supported_layer_lookup = {
        2017: {'county': 84, 'tract': 8},
        2018: {'county': 86, 'tract': 8},
        2019: {'county': 86, 'tract': 8},
    }

    def __init__(self, year="latest"):
        self._cache = dict()
        if year == "latest":
            year = 2019
        if year < 2017:
            raise NotImplementedError(
                "The requested year {} is too early. "
                "Only 2017 and onwards is supported.".format(year)
            )
        self._year = year
        self._api = APIConnection("ACSDT{}Y{}".format(5, year))
        self._api.set_mapservice("tigerWMS_ACS{}".format(year))

    @property
    def _layer_lookup(self):
        pass

    @_layer_lookup.getter
    def _layer_lookup(self):
        return self._supported_layer_lookup[self._year]