ClimateInequality / PrjCEC

China Environment Children
0 stars 0 forks source link

Data Search: China Census County-demographic Data and Corresponding Shape Files #4

Open FanWangEcon opened 1 year ago

FanWangEcon commented 1 year ago

@szkaifeng

Obtain China County-level population data that shows: the joint distribution of gender and age at the county level, in multiple years if possible.

Note:

Potential data sources:

FanWangEcon commented 1 year ago

By May 22nd

@szkaifeng

Find

Visualize

Generate several figures, Map, given the 2020 demographic data:

Given the 2020 vs 2010 demographic data, matching counties where possible, hopefully

szkaifeng commented 1 year ago

Published book [中国2000年人口普查分县资料] https://bbs.pinggu.org/thread-433655-1-1.html; https://data.casearth.cn/sdo/detail/5c19a5670600cf2a3c557af9; https://github.com/leiii/census/tree/main/data/census (2010,2020)

FanWangEcon commented 1 year ago

By June 12th

We have identified 2020, and 2010 census data, with county-specific breakdown of gender $\times$ age shares.

Explore 1990 and 2000 Census Data

@marcomlaghi

Explore Earlier Era Census Data

@marcomlaghi

Shape files

@marcomlaghi and @szkaifeng

The shapefile corresponding to each census is different, we do not need harmonized shape files, but do need shape files to show county-boundaries in each census year.

The information we need from these shapefiles is basically which $0.25 \text{km} \times 0.25 \text{km}$ square (or other smallest unit of geographic climatic data) corresponds to which county in which year. So that we can link climate data with population data.

szkaifeng commented 1 year ago

Note: [IPUMS IHGIS] do have 1982, 1990, 2000, county level shpfile by population and age and sex

marcomlaghi commented 1 year ago

1990 county data from sedac https://sedac.ciesin.columbia.edu/data/set/cddc-china-population-census-and-agriculture

szkaifeng commented 1 year ago

a decent public harmonized shapefile for censuses 1 to 6 https://www.scidb.cn/en/detail?dataSetId=849628989872930816 A following work could be manually adding the age group by gender by county to the county.

Haoran Wu, Liang Gao, Dongdong Song, et al. A dataset of district/county-level population distribution of China’s six national censuses[DS/OL]. V2. Science Data Bank, 2022[2023-06-22]. https://cstr.cn/31253.11.sciencedb.j00001.00273. CSTR:31253.11.sciencedb.j00001.00273.

Haoran Wu, Liang Gao, Dongdong Song, et al. A dataset of district/county-level population distribution of China’s six national censuses[DS/OL]. V2. Science Data Bank, 2022[2023-06-22]. https://doi.org/10.11922/sciencedb.j00001.00273. DOI:10.11922/sciencedb.j00001.00273.

marcomlaghi commented 1 year ago

1990 county data from sedac https://sedac.ciesin.columbia.edu/data/set/cddc-china-population-census-and-agriculture

here is a link to the label dictionary as well: https://citas.csde.washington.edu/data/chinaA/datasets.htm

marcomlaghi commented 1 year ago

2000s Census Data

County Shapefile from here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VKGEBX&version=1.0

Age-Gender Census counts from here: https://dataverse.harvard.edu/dataverse/chinacensus

Data is from China Data Online but these versions are preferred as this allows download province by province, while download direct from CDO appears to need downloading prefecture by prefecture.

Next step: After I check to make sure IPUMS does not provide this data already, I will try and combine the datasets as follows...

Location id (county code); gender; age groups grouped; year by year age (~700,000 rows)

marcomlaghi commented 1 year ago

Documented 2000 Census data as follows:

Used metadata information to create text files which I then inserted commas into to create CSV files of each province's counties including county code, English name and Chinese name. After cleaning, I was able to merge these 31 new files to each of the 31 province's census data from the dataverse/China Data Online 2000 data, using the countys' English names. By doing this I was able to make sure names were not repeated or misattributed, also it seems like the data often had an additional name added for common county names to further distinguish them. I took these 31 merged files and merged/compiled them into one.

Files saved on Dropbox: ./marco_laghi/CensusUpload2000