covidatlas / coronadatascraper

COVID-19 Coronavirus data scraped from government and curated data sources.
https://coronadatascraper.com
BSD 2-Clause "Simplified" License
364 stars 179 forks source link

Move to HASC and FIPS codes, use GADM? #286

Closed hyperknot closed 4 years ago

hyperknot commented 4 years ago

So HASC codes seem to be the best way to refer to international regions. They look like ES.CL for the Castile and León (CL) region within Spain (ES). For US we have FIPS codes.

I've looked more and more into this one and I see that GADM really is the highest quality geospatial dataset out there, and that HASC codes are clearly the best for our use case.

There are some critical questions:

  1. Can we use GADM? It has a restrictive license but we can ask for a permission to use it. Important point: we have to make sure that it covers all of our users, let them be commercial entities: newspapers, etc. @HerbCaudill can you ask the owner?

  2. I can process the shapefiles into small GeoJSONs that's not a problem.

  3. Getting the population data via HASC codes will need scraping though. This project happens to has amazing experts for scraping, can someone write it? We'd need a JSON for HASC > population for all pages from this website: http://www.statoids.com/

For example: http://www.statoids.com/ufr.html would have

{
  "FR.AR":  7773595,
  "FR.BF": 2902010,
  ...
}

It can be a one-off script, returning a JSON, that's it really. Only for non-US regions.

  1. For US, we can use the official Census datasets. I can process that into small GeoJSONs as well. For US we can settle on FIPS codes.

@lazd @camjc @qgolsteyn @ryanblock @HerbCaudill

lazd commented 4 years ago

@hyperknot this is way out of my wheelhouse, I will defer to whatever you determine is the best way to do it. Thanks for doing the hard work!

HerbCaudill commented 4 years ago

@hyperknot We have permission for GADM; we'll just need to add the conditions below to our license.

On Mon, Mar 23, 2020 at 9:58 AM Herb Caudill herb@devresults.com wrote: Hi, Robert. I'm volunteering with an open-source effort to scrape data on Coronavirus cases from publicly available sources and make it available to researchers, journalists, etc. 

https://coronadatascraper.com   https://github.com/lazd/coronadatascraper/  

One of the outputs of this effort is GeoJSON file. So far we've cobbled the subnational boundaries together from various sources. We were hoping we might be able to use GADM data for this purpose, but weren't sure if it would be permitted with GADM's non-commercial license: This effort is of course non-commercial, but we're hoping that the outputs will be reused by many other organizations, possibly including media and other for-profit entities. 

Let me know what you think - 

Thanks Herb

On Mon, Mar 23, 2020 at 8:13 PM Robert J. Hijmans r.hijmans@gmail.com wrote:

Hi Herb, I do not object if you add license information stating that the admin boundaries are from GADM and cannot be used for other purposes than the for mapping corona / covid-19. Does that work for you? Robert

HerbCaudill commented 4 years ago

I've written to Statoids to see if they'll donate the data to the project - it would save us a lot of effort, and scraping their website would probably be a violation of their terms of use anyway.

hyperknot commented 4 years ago

I'm working on an OSM + Wikidata based solution, hopefully I can publish it very soon and then we can migrate over.

hyperknot commented 4 years ago

I will finish the new system today, based on ISO codes, using OpenStreetMaps. We'll have two internal codes: iso1:ES and iso2:ES-CL.

About FIPS, can someone explain me a bit more? What would be the best internal system? fips:06085 is good?

Is there any system existing for cities?

hyperknot commented 4 years ago

I manage to get ISO code based extracts from OpenStreetMap, I've just finished rewriting country levels: https://github.com/hyperknot/country-levels

Tomorrow I'll update my PR with this system.

DavidGeeraerts commented 4 years ago

@hyperknot FIPS is heavily used in GIS in the US. You can get the codes in spreadsheet format here. USA County codes can easily be extracted from here. Adding FIPS codes to the dataset would be good to direct this dataset to a science grade dataset. Thanks for your efforts.

hyperknot commented 4 years ago

@DavidGeeraerts thanks. Do you know any similar system for city level? Actually I think if we can move international areas to ISO1 and ISO2 and US counties to FIPS, that'd already be a huge improvement.

hyperknot commented 4 years ago

I see there are actually perfect code systems in that file for city, etc. subdivision levels.

DavidGeeraerts commented 4 years ago

Yeah, the "all-geocodes-v2016" has the city codes.

hyperknot commented 4 years ago

Thanks I found it. For me right now I'm concentrating on international ISO1 and ISO2 codes, as well as US counties. But US cities can be added in the future as well.

hyperknot commented 4 years ago

Moving to iso1, iso2, fips codes in #527