GeoDaCenter / covid

COVID Atlas alpha code
https://geodacenter.github.io/covid/
GNU General Public License v3.0
47 stars 19 forks source link

Daily county 1P3A update #9

Closed Makosak closed 4 years ago

Makosak commented 4 years ago

We need a volunteer lead to start updating the county case file we have from 1P3A and help determine a more efficient protocol/workflow process. Things to consider:

Once we have a protocol in place we will be able to better take advantage of volunteers and potential RAs to help update this on a regular (daily) basis

qinyun-lin commented 4 years ago

A small note for crosschecking: the # of total confirmed cases today should be larger or equal to that of yesterday, at each county and state level.

Sihan-Mao commented 4 years ago

Cron jobs might be a good way to trigger this workflow where the application is hosted. @Makosak Will 1point3acres provide real-time data in the future rather than lagging 2 days?

Sihan-Mao commented 4 years ago

or a cloud solution like google cloud composer instead of on-premises

Makosak commented 4 years ago

It looks like 1P3A will not be updating beyond 2 days; we haven't heard back from them, and I imagine the 2-day window gives them extra time to validate.

The UW-Madison we've connected with have software engineers and grad students who can also help with this, as well as cross-validating with another county-level data source they're pulling.

Automated priority: ID discrepancies for human editors to resolve.

qinyun-lin commented 4 years ago

Proposed protocol for data checking: (1) check whether the state data are consistent with the state health departments (2) (automatically) check whether the county # of cases sum up to the state # of cases, list the states where this number doesn't match (3) for the states in the list of step 2, check the county # of cases between 1P3A (where you can find each data source) and state health departments. Can also check with other county-level data sources (need to add possible sources here that doesn't use 1P3A, otherwise the checking is not useful?). Preparation: Assign each person a bunch of states. Everyone gets familiar with the states that he/she is in charge of in terms of (a) when the states update information every day (b) how the data is reported.

Feel free to make suggestions/comments!

qinyun-lin commented 4 years ago

Meeting on 3/24: Before going into deep validation, we will start from cross-checking three datasets:

JohnWSteill commented 4 years ago

Working in county_validation directory, I tossed up some function scaffolding with comments.

Quinyun, would you like to specify the validation_out.csv a little bit?

qinyun-lin commented 4 years ago

Sure! How about something like this:

Suggestions are welcome!


From: John W Steill notifications@github.com Sent: Thursday, March 26, 2020 10:48 AM To: GeoDaCenter/covid covid@noreply.github.com Cc: Qinyun Lin qinyunlin@uchicago.edu; Comment comment@noreply.github.com Subject: Re: [GeoDaCenter/covid] Daily county 1P3A update (#9)

Working in county_validation directory, I tossed up some function scaffolding with comments.

Quinyun, would you like to specify the validation_out.csv a little bit?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/GeoDaCenter/covid/issues/9#issuecomment-604507709, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALNP5TJFI4ZYTY4EJSBHM6TRJN2LVANCNFSM4LPUXAIA.

JohnWSteill commented 4 years ago

Sounds great. Easy to change.

Makosak commented 4 years ago

@Sihan-Mao could you assist @linqinyu with comparing some merge conflict issues with a few of the data sources (while John works on the formal solution)? She can explain further! :)

qinyun-lin commented 4 years ago

@Sihan-Mao Thanks for helping out! Here is something I think we can start working on: compare how different John Hopkins, NYTimes, and USAFacts are for one particular day, in terms of total confirmed cases and death counts. Let's say we focus on 03/28 for now.

These three datasets should have FIPs that can be used as a unique identifier for merging. But checking whether it is a unique identifier before merging would be a good idea.

Some preliminary results I am looking for are something like: how many rows/counties can be matched between these datasets, and how many rows/counties have different numbers among those matched ones.

Let's just see how different these three datasets are for now. Let me know if you have any questions! We can also communicate on slack.

lixun910 commented 4 years ago

+1 also check if these counts at county level add up and match the state level data.

On Mar 30, 2020, at 9:39 AM, Qinyun Lin notifications@github.com wrote:

 @Sihan-Mao Thanks for helping out! Here is something I think we can start working on: compare how different John Hopkins, NYTimes, and USAFacts are for one particular day, in terms of total confirmed cases and death counts. Let's say we focus on 03/28 for now.

For USAfacts, you can download csv file here: https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/?utm_source=MailChimp&utm_campaign=census-covid2. For New York Times, you can download csv file here: https://github.com/nytimes/covid-19-data. For John Hopkins, download data here: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports. These three datasets should have FIPs that can be used as a unique identifier for merging. But checking whether it is a unique identifier before merging would be a good idea.

Some preliminary results I am looking for are something like: how many rows/counties can be matched between these datasets, and how many rows/counties have different numbers among those matched ones.

Let's just see how different these three datasets are for now. Let me know if you have any questions! We can also communicate on slack.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Sihan-Mao commented 4 years ago

Sure np! Will start working on that. My email is sihan.mao@gmail.com. @linqinyu Can you send me an invite for slack? Thanks a lot!

qinyun-lin commented 4 years ago

Cool! I just sent out an email of invitation.


From: Sihan-Mao notifications@github.com Sent: Monday, March 30, 2020 3:14 PM To: GeoDaCenter/covid covid@noreply.github.com Cc: Qinyun Lin qinyunlin@uchicago.edu; Mention mention@noreply.github.com Subject: Re: [GeoDaCenter/covid] Daily county 1P3A update (#9)

Sure np! Will start working on that. My email is sihan.mao@gmail.commailto:sihan.mao@gmail.com. @linqinyuhttps://github.com/linqinyu Can you send me an invite for slack? Thanks a lot!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GeoDaCenter/covid/issues/9#issuecomment-606222555, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALNP5TKHF3PLJMQ2KOYMUOLRKD4TJANCNFSM4LPUXAIA.

Makosak commented 4 years ago

Should I close this? I know the team is almost there, amazing work again!

qinyun-lin commented 4 years ago

Sure! I guess we can always open a new issue regarding this if we need!

On Thu, Apr 2, 2020 at 2:00 PM Marynia notifications@github.com wrote:

Should I close this? I know the team is almost there, amazing work again!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GeoDaCenter/covid/issues/9#issuecomment-608013857, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALNP5TLGQOQEV2BJWKJEOITRKTHBZANCNFSM4LPUXAIA .

-- Qinyun Lin, PhD Postdoctoral Researcher Center for Spatial Data Science University of Chicago

Pronouns: she/her/hers