BasisResearch / cities

Home of Basis development for the 2023 TOP Sprint
MIT License
6 stars 0 forks source link

Data: Fill in the few missing counties' data (manually or scraping) #66

Open riadas opened 10 months ago

riadas commented 10 months ago

For the existing data variables on the frontend (Population, GDP, Industry Composition, Urbanicity, Ethnic Composition). Maybe the three spending variables too, but that seems harder/less necessary.

emackev commented 10 months ago

See https://github.com/BasisResearch/cities/pull/59. Each variable should have a dataset-specific cleaning script.

emackev commented 10 months ago

@Niklewa , we learned from the user advocates that the reason lots of counties aren't showing up in the state of Virginia is that there are a lot of independent cities that are treated as counties. Do you remember which dataset(s) excluded the Virginia counties? I think this could be a very simple case of how they are labeled in the dataset (the standard is to treat them as counties, even though technically they are cities).

Niklewa commented 10 months ago

@emackev, I have investigated this issue. The raw dataset for both GDP and population contains 81 FIPS codes for Virginia. However, the one used for ethnic composition contains 188 unique FIPS codes. For the context:

"The Commonwealth of Virginia is divided into 95 counties, along with 38 independent cities that are considered county-equivalents for census purposes."

emackev commented 10 months ago

Thanks for checking this!!