NYCPlanning / db-acs

American Community Survey data processing for Population Fact Finder
4 stars 1 forks source link

python package for calculating pff #32

Open SPTKL opened 3 years ago

SPTKL commented 3 years ago

Ideas:

  1. 1 metadata file for all different variables with all information we need. e.g.
    {
    'pff_variable' : 'pop25t29',
    'acs_variable' : [
            "B01001_035",
            "B01001_011"
        ],
    'domain':'demographic',
    'base_variable':'pop_1',
    'rounding':'2'
    }
  2. we should use a existing python package to pull pff data instead of using request, this would make things a lot easier
    
    from census import Census

c = Census(os.environ['API_KEY']) pd.DataFrame(c.acs5.get(('NAME', ','.join(variables)), {'for': 'block group:*','in' : 'state:36 county:081'},year=2018)))

3. we would need to create 1 master spatial lookup table or object. e.g. with both census geoid and boroct
```python
{
    'geotype':'NTA2010',
    'pff_geoname':'BK01',
    'pff_geoid':'BK01',
    'acs_geoid':''
}, 
{
    'geotype':'CT2010',
    'pff_geoname':'QN43',
    'pff_geoid':'4157102',
    'acs_geoid':'36081157102'
}

User experience

from pff import Pff
pff = Pff(api_key='XXXXXXXXX')

# if we just do a NTA level calculation, in the background it should be pulling tract level data and then aggregate to NTA level
pop25t29 = pff.calculate(pff_variable='pop25t29', geotype='NTA', year='2018')
pop25t29.head()

it should be showing the following fields: geotype, geoname, geoid, dataset, variable, c,e,m,p,z