COVID-Weather / covid-environmental-factors

Analysis of COVID19 infection rates and their environmental conditions
GNU General Public License v3.0
4 stars 2 forks source link

Using spatial tiles for accurate province/region-mean environmental variables #2

Open hdrake opened 4 years ago

hdrake commented 4 years ago

The easiest to implement with our existing code base would be regionmask, which I have some experience using for taking averages over various continents and ocean basins.

hdrake commented 4 years ago

Here is a basic example for computing the state-wide average specific humidity: https://github.com/COVID-Weather/covid-environmental-factors/blob/dd3fe7befbe8a2502662bb78a5db4f35e8001d00/notebooks/T_preprocess_state-level_UStest.ipynb

(Note: doing this properly requires weighting by grid-cell area!)

hdrake commented 4 years ago

This module https://github.com/mathause/regionmask/blob/master/regionmask/defined_regions/natural_earth.py will probably be useful for defining our own regions based on shape files from http://naturalearthdata.com/ (conveniently, this is the same place that regionmask gets their shapefiles for their readily-available regions).

hdrake commented 4 years ago

This is implemented in https://github.com/COVID-Weather/covid-environmental-factors/pull/15 (thanks @mara-freilich!) for the Region/Country level (admin0) and in https://github.com/COVID-Weather/covid-environmental-factors/pull/17 for the State/Province level (admin1) thanks to some small changes to the regionmask python packages!

Ultimately we also want this at the county level (admin2) for at least the U.S., possibly also Italy and China. This data unfortunately is not available in the natural earth dataset, which regionmask is already pre-conditioned to use. It will take a bit more work to process arbitrary shapefiles... Leaving this Issue open until that is resolved.

hdrake commented 4 years ago

@jproctor91 mentioned in https://github.com/COVID-Weather/covid-environmental-factors/issues/8#issuecomment-604797217 that the GPL dataset has China & Italy shapefiles and epidemiological data.