Closed bucketteOfIvy closed 1 year ago
I just committed and pushed the updated datasets and generating files to my fork of this repository, so now is probably a good time to update about the direction I've taken on these so far.
The last update for this specific issue is that the 1980 county level data -- which is not included in this commit -- does not have a clear pathway to interpolation. Population weighted interpolation from 1980 geographies to 2010 geographies is made challenging by the lack of nationwide census tracts. Areal interpolation is likely inappropriate due to the possibility of new counties being cities which have split off from the county, and which would be anticipated to house population disproportionate to their land area. I'm plan to poke around for other options.
The variables present in the DS01 historic data do not match the variables listed in the DS01 data tables documentation. This seems to be because Social Explorer's "Historic Census Data on 2010 Census Tracts" datasets do not include the counts needed for the DS01 data table documentation, which was likely caused by the historic censuses not aggregating their data directly into those relevant categories. But, as our historic DS01 data files are based on Social Explorer's data, we are also missing those categories.
However, there does seem to be a workaround for the some of the data. The historical censuses seem to have released dis-aggregated tract level race, ethnicity, age, and education attainment data from which most of the missing data can be reconstructed. I'm currently planning to download this data from IPUMS NHGIS and then crosswalk the data to 2010 census tracts using weights from the Longitudinal Tract Database, but have a few open questions about data comparability I wanted to track here that will need answered prior to merging these changes. Namely:
noHSP
variable) will exclude GEDs for the 1980 population but not for 1990 on. Are these sufficiently different that the 1980 Census education variable should be renamed or treated differently, or is it sufficient to just note this discrepancy in the documentation?