ResearchSoftwareInstitute / greendatatranslator

Green Team Data Translator Software Engineering and Development
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

US Census/ACS SocioEnvironmental Exposures Data: Plans for Data Download and Processing #130

Open karafecho opened 6 years ago

karafecho commented 6 years ago

Steve Appold to download and process US Census/ACS data on select variables: total population size; # people >=25 years; median household income; proportion of households headed by non-Hispanic whites; proportion of people >=25 years with a HS diploma or less at their highest level of schooling; proportion of people without a car; proportion of people without health insurance; and proportion of people speaking other than English at home. Steve will then work with Kara and Hao to integrate the data with the clinical data.

karafecho commented 6 years ago

Notes on dataset:

  1. Values of "0" and "1" represent true estimates.
  2. Missing data reflect block-group sizes <50 persons and/or US Census Bureau regulations regarding public release of household income data.

@lstillwe : Looks like I added notes on the ACS data to GitHub, not the spreadsheet.

karafecho commented 6 years ago

Plans for the ACS data: (1) work toward nationwide data on the same initial set of variables that were generated for the NC data; (2) explore additional variables that may be of relevance to the Translator project; (3) incorporate standard error calculations for the stand-alone Translator Socioenvironmental Exposures Service, but not the Translator DDCR Service, as the clinical binning process negates the importance of standard errors; and (4) create an API for the Translator Socioenvironmental Exposures Service. ACS data can be found in GreenTeam_CedarGroveKFBS Google folderGreen Team_CedarGroveKFBS Google folder.

We will not expand the Socioenvironmental Exposures Service to include property tax data, samples of which can be found in CedarGroveKFBS Google folder](https://drive.google.com/drive/folders/1mSituPpX8O907ZDDEVeGruhla79Zq2nQ?usp=sharingl). The data are appropriate for the Translator project for several reasons: (1) obtaining the data on a county-by-county basis seems like too much work, at least for the feasibility phase of the Translator project; (2) the data are inconsistent across counties; (3) there's a lot of missing data points; and (4) usability of the vast majority of available data fields is questionable.

xu-hao commented 6 years ago

Dependencies:

See @karafecho's post

karafecho commented 5 years ago

7/30/2018 update:

Focus moving forward will be on ACS data and public school data. Specifically, we're prioritizing nationwide data on the select variables that we pulled for North Carolina. The standard errors for the nationwide data are not a high priority (at least prior to the September hackathon), but we are interested in obtaining those data by the end of the feasibility phase of the project (December). We also are using background literature on socioeconomic factors and asthma to guide the selection of additional select ACS variables. Lisa, in consultation with Kara, will be developing an ACS API. Ann and Kara will develop an interesting use case on public school exposures and asthma. The use case will be developed here.

karafecho commented 5 years ago

8/20/18 update:

Steve Appold confirmed that he will be provisioning the nationwide ACS data (sans standard errors) by the end of the week.

karafecho commented 5 years ago

Steve Appold provisioned nationwide ACS data with standard errors on 10/20/18. Lisa is adding the new data to the API.