Population Grid Data - Githubissues

Mujingrui commented 3 years ago

Hi @szymon-datalions

Thank you for reading the message. I am little confused that how you get the population centroid data based on 2010 U.S. census block level data. Thank you for your help. Many thx:)

SimonMolinsky commented 3 years ago

Hi @Mujingrui ,

I got this data from here. This is a tedious task because you have only a drop-down list with single states and if you plan to get data for the whole USA then you must open this list many times and it redirects you to the flat files. Based on those files I've prepared shapefile for analysis (read those txt with pandas, convert to spatial dataframe with GeoPandas and store transformed data as a shapefile).

Here's screenshot of the webpage block with the state selection list for population centroids:

censusblocks

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for your nice reply!!! I have downloaded U.S. 2010 census block group level data. And I found the northeastern data is a little different from your cancer_population_base, which are shown in attachments. I am thinking perhaps you allocate population at risk data to a grid of cells based on 2010 census block. So I am trying to construct similar dataset, but it did not work. Would you mind telling me your method or any software you used? Thank you for your time and reply!!

Best, Rplot Rplot01

SimonMolinsky commented 3 years ago

@Mujingrui thanks for the thought-provoking questions!

I've checked whole data preparation process and at some point I've started using files from here. There are shapefiles with population estimates per block. With those I've:

A. Calculated population size per centroid of each polygon:


import os
import geopandas as gpd

base_path = 'directory_with_census_block_files'

dirs = os.listdir(base_path)

geodataframes_files = []

# Select census block files
for di in dirs:
    if di.endswith('pophu') :
        new_dir = os.path.join(base_path, di)
        new_dirs = os.listdir(new_dir)
        for f in new_dirs:
            if f.endswith('.shp'):
                new_path = os.path.join(new_dir, f)
                geodataframes_files.append(new_path)

core = gdf_t[['POP10', 'geometry']].copy()  # only those two columns are important

# Append areas
for f in geodataframes_files[1:]:
    gdf = gpd.read_file(f)
    gdf = gdf[['POP10', 'geometry']]
    core = core.append(gdf, ignore_index=True)

# Get centroids
core['centroid'] = core.centroid

# Drop polygon geometry
generated_pop_blocks = core.drop('geometry', axis=1)

# Rename columns - now 'centroid' become 'geometry'
generated_pop_blocks.columns = ['POP10', 'geometry']
generated_pop_blocks.geometry = generated_pop_blocks['geometry']

generated_pop_blocks.to_file('population_centroids.shp', encoding='utf-8')

B. I've created hexbin map over area of the North-Eastern US in QGIS. Aggregate population over each hex in GeoPandas:


import geopandas as gpd
import pandas as pd

pts = gpd.read_file('repro_base_points.shp')
hexes = gpd.read_file('hexbin_raw.shp')

ndf = gpd.sjoin(pts, hexes, how='left', op='within')

grouped_pts = ndf[['id', 'POP10']].groupby('id', as_index=False).sum()

df = pd.merge(hexes, grouped_pts, how='outer', left_on='id', right_on='id')

final = df[~df['POP10'].isna()].copy()

final.to_file('hexes_POP10.shp')

C. Convert hexgrid to centroids -> that's the point where population centroids used for analysis are created. (Also in QGIS).

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for your so nice reply!!! I have created the hex bins map with python and QGIS according to your suggestion. And I found if transform NA values in population to 0 in final = df[~df['POP10'].isna()].copy(), there will be a more clear map, especially for Canada. Since there are no people living in Northern Ontario.

Red bins represent there are no people living around. Thank you for your help!!!

Best,

SimonMolinsky commented 3 years ago

Terrific :)

Thanks,

(PS. I'm closing this issue for now).

SimonMolinsky / pyinterpolate-paper

Population Grid Data #4