ActiveConclusion / COVID19_mobility

COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉
MIT License
268 stars 58 forks source link

Scraper of Mobility reports v 2.0 #6

Closed ActiveConclusion closed 4 years ago

ActiveConclusion commented 4 years ago

Google recently published a mobility report with time-series in CSV format. You can download it on their website. That means there's no need for a PDF file parser anymore. Due to that, I plan to change the concept of this repository. Here are my points that I propose to implement here:

  1. Archive the PDF parser as part of the great history of this repository.
  2. Automatically download to this repository all available files (including PDF) on Google and Apple sites. If there are no problems with Google reports, the Apple website parser needs to be rewritten, because my ad-hoc solution does not work, unfortunately.
  3. Make one summary file from Google and Apple reports of the following structure:
country sub_region_1 sub_region_2 date retail grocery_and_pharmacy parks transit_stations workplaces residential walking driving transit
... ... ... ... ... ... ... ... ... ... ... ... ...
  1. Make a simple visualization app for this data (for example, using Bokeh library).

Feel free to offer your suggestions here. Thank you!

ladew222 commented 4 years ago

Great! I will pull this into balefire.info for USA. The color shading was resolved as an FYI. I am using D3 and D3Plus for my visuals. The drawback is my visuals are USA obviously. I can look into the possibility of doing a global as it would involve mostly ignoring/reducing the data merges.

ladew222 commented 4 years ago

I got the google data in the system. It is pretty interesting. I do need to sit down with it some time but there are some pretty telling Pearson coefficients correlating with it. I also see that confirmed cases per 10k is higher when mobility data is higher. What I really should do is assess the log of that two weeks after to see if there is a correlation there. The graphs suggest that is the case. I will put a screen shot below showing 4/11 and one plot of AK. As an FYI, the university is doing a short article on the tool next week so hope to get more info on our data out there.

Screen Shot 2020-04-17 at 10 39 23 PM
ActiveConclusion commented 4 years ago

@ladew222 Cool! I hope that I compile a summary file from Google and Apple reports in the next 2-3 days.

ladew222 commented 4 years ago

Wow cool.

ActiveConclusion commented 4 years ago

I've recently made a couple of updates, so I summarize what's been done here:

  1. Everything related to PDF parser now is in the directory "scraper v 1.0".
  2. Apple report is now automatically downloaded to the repository every day. But with Google data now a little problem: if the CSV download is okay, the ability to download the PDF is now disabled, because the structure of Google webpage has significantly changed. But I think that's not a critical problem.
  3. Also, now automatically generated summary reports from Google and Apple data, which I mentioned above. They are available here. But some points should be noted here:
    • the matching of subregions from Google data with cities from Apple data needs to be further improved. Currently, they are matched as they are in the original data.
    • with the U.S. data is a serious problem because they are quite heterogeneous. So far, the cities are in the "sub_region_1" column. I think it is probably even better to remove the detailed breakdown by counties for the US from the summary report.
    • It is appropriate to adjust the baseline for Apple data for a longer period that intersects with the baseline Google period (e.g. January 13 to February 6). This is a rough approach, but I think it would be better than just taking the baseline for January 13th.
  4. Google Sheets are now updated automatically.

Also, it is necessary to think about the view of data visualization app, which would provide simple answers about the mobility situation in a particular region.

ladew222 commented 4 years ago

Cool. Here is the choropleth of residential mobility as it is now if you havent seen it.

Screen Shot 2020-04-20 at 9 01 17 PM
ActiveConclusion commented 4 years ago

Wow, looks nice! But I couldn't reproduce this picture in your dashboard( I got it like this: balefire

Maybe, I didn't press some button or checkbox?

ladew222 commented 4 years ago

My fault. It looks like Google does not have significant enough data for that metric. Retail and Recreation has the fuller map.

ActiveConclusion commented 4 years ago

Got it, thanks! I suggest adding the ability to make a breakdown by states, it will allow us to see the picture throughout the United States.

ladew222 commented 4 years ago

Makes sense. It was on my list and fell off. I will add that in. You can compare states in the plot using the filter on the top and selecting the values down below. Are you thinking about a map by states?

On Apr 21, 2020, at 1:12 PM, ActiveConclusion notifications@github.com wrote:

Got it, thanks! I suggest adding the ability to make a breakdown by states, it will allow us to see the picture throughout the United States.

ActiveConclusion commented 4 years ago

Are you thinking about a map by states?

Yes

ActiveConclusion commented 4 years ago

Last week's Update Digest:

  1. The problem with downloading Google PDF reports fixed (I fixed this problem a week ago, just didn't write here).
  2. Apple has added more regions/cities to their report. The main problem with it is that cities and subregions go without country names, but I have already fixed this issue (it was a challenging issue for me).
  3. With the addition of new data from Apple, there are now huge problems with the merging of reports, the scale of which I have not even assessed yet.
ActiveConclusion commented 4 years ago

Latest updates:

ActiveConclusion commented 4 years ago

I haven't written anything here in a while, but I should have. So, point by point: