cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Add more Safegraph indicators #191

Open huisaddison opened 4 years ago

huisaddison commented 4 years ago

Safegraph offers other datasets, like bar and restaurant... much larger dataset, harder to wrangle with, but would be useful for forecasting, especially causal stuff being done by Larry et al.

@capnrefsmmat

capnrefsmmat commented 4 years ago

Brief context to aid prioritization: The current indicators hardly seem to change over time, even as cases increase dramatically. We expect data on bar and restaurant opening or use to be more informative, since it seems a lot of spread comes that way. It would also change much more as restrictions are imposed and lifted.

The Facebook survey revision will have items on restaurant/bar patronage, but SafeGraph has it historically.

jingjtang commented 4 years ago

Datasets related:

capnrefsmmat commented 4 years ago

Larry says, about additional SafeGraph data, "it's not urgent but I do need it eventually." Not sure what priority that gives it...

krivard commented 4 years ago

The bar and restaurant signals are part of the neighborhood dataset, which is only available my month (with a by-weekday breakdown, but not a by-day breakdown). Is that useful?

There are individual business addresses available by day (by hour even) in a weekly rollup, but we'd need Larry to tell us which NAICS codes he's interested in.

capnrefsmmat commented 4 years ago

Got it; will ask modeling to make a list of codes they're interested in.

jingjtang commented 4 years ago

core_poi.csv in Core Places : lists all of the points of interest with location name, address, category, and brand association (> 1 GB). It will be updated each month.

Patterns.csv in Monthly Places Patterns (from 01/2018 to present): has daily number of visits, each csv file represents a dataset for a month. Each row represents a place with Safegraph_id, location name, address including zip code etc. Each cell in the visit_by_day column is a list of length 30-31 (number of visit for each day in this month). (Similar in Weekly Patterns)

neighborhood_patterns Each row is a Census block group.

capnrefsmmat commented 4 years ago

Larry would like these NAICS codes: bars (722410) and restaurants (722511).

huisaddison commented 4 years ago

Hi all - spoke briefly with @jingjtang about this. Seems like some of the datasets available monthly actually have data on a daily resolution?

Some of my thoughts:

huisaddison commented 4 years ago

I had a short chat with Larry today - his team is interested in "how many people are going to bars (722410) and restaurants (722511) in each state (Rob also wants county) at a daily resolution".

He suggested we also normalize by population, since the existing signals are already normalized by population.

Since @jingjtang may already have her hands full with other work, I offered to take care of this, since Larry's team needs at least a WIP version of this very soon. I will attend tomorrow's eng meeting to discuss, since I know a lot of the eng working procedures have changed since I last wrote a pipeline. @jsharpna @krivard .

jingjtang commented 4 years ago

Problems encountered when trying to switch to utility @jsharpna:

  1. No zip_to_msa or zip_to_state
  2. function zip_to_countyand zip_to_hrr met with NotImplementedError: Can't perform this operation for loaders without 'get_data()'
  3. Didn't find a function for getting population information, still need a population file in ./static
krivard commented 4 years ago

https://github.com/cmu-delphi/covidcast-indicators/pull/225 merged. Next: