First attempt at norm=permillion

davidbau / covid-19-chart

Chart of current COVID-19 time series data. Enables a variety of county- state- and nation-level comparisons and data exploration.

https://covid19chart.org/

18 stars 4 forks source link

First attempt at norm=permillion #33

Closed dmaymudes closed 4 years ago

dmaymudes commented 4 years ago

Please don't accidentally accept this pull request--it doesn't actually work, because the region label in the series passed to apply_norm isn't in the same format as the population labels I have.

Let me know what you suggest; I could figure out how to change the population.js file to have labels like "Switzerland" and "MA" rather than "SWI" and "MA, US", or I could figure out a way to plumb the more canonical labels through.

I also see that timeseries-byLocation.json has "Barnstable County, MA, US" where whatever datasource the current version of the chart uses has "Barnstable, MA" so maybe further cleanup would be required.

I guess nothing so terrible would happen if you did accept it, because it doesn't do anything yet without norm=permillion.

davidbau commented 4 years ago

Very cool! So the main issue is that we need to normalize location names between JHU CSSE and CDS data sources. This is something we should do for other reasons, too, because we can get much higher-quality data if we merge the historical CSSE and CDS time series data.

Captured the things that need to be done in issue #34.

dmaymudes commented 4 years ago

I normalized the names in the population file so at least for countries, US states, and US counties they match what the graph data uses and setting norm=permillion appears to work for the cases I tested.

the timeseries-byLocation.json file now has more entries than it used to, so there are now 9187 entries with populations and 824 without.

It would no longer be a terrible idea to accept this pull request, but let me know if you want further changes.

(or accept it and then make them yourself, if you'd rather)

I see that in #5 the referenced graph is in "percent" rather than "per million", let me know if you think that's better.

davidbau commented 4 years ago

Sorry for the slow reply. A few requests

Let's called it "norm=pop" instead of "permillion". I.e., normalized by population. Per-million does seem like the right units.
County-level populations - visit this page - http://localhost:8000/#/?domain=NY&norm=permillion&advanced=1. I think the file contains all the needed county-level population numbers, but for some reason they're not getting lined up.
Omit lines when we don't have population data instead of drawing the wrong line. If for whatever reason something doesn't line up, it would be better to filter out the data series so it's just not shown, instead of showing something with the wrong units.