cmu-delphi / covidcast-indicators

Back end for producing indicators and loading them into the COVIDcast API.
https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
MIT License
12 stars 17 forks source link

Examine Facebook survey for possible new sensors #18

Closed krivard closed 3 years ago

krivard commented 4 years ago

This paper says loss of smell or taste are excellent predictors; we should make a sensor out of the data we're getting from the facebook survey on that (B2==13)

capnrefsmmat commented 4 years ago

Other things on the Facebook survey:

We should probably arrange for someone to examine many of these and identify the most useful signals, in collaboration with someone from forecasting who knows what information they need.

krivard commented 4 years ago

For @eujing : an EDA exploring other questions in the survey to see if they are correlated or not with cases.

@krivard to send over the last week of survey data & a copy of the survey questions

(does not have to be in R, can be in python. geo aggregation in R is already done, so you might do those aggregations in R and then export a file for EDA in python, whatever's easy)

capnrefsmmat commented 4 years ago

List of aggregate ideas from current survey wave:

From upcoming revision:

capnrefsmmat commented 4 years ago

Another testing aggregate to add: Percentage of people reporting they wanted to be tested in the past 14 days but were not. This will be a good measure of access to testing, if the sample size is reasonable.

capnrefsmmat commented 4 years ago

Time to get more specific about potential new sensors from the symptom survey. I've written them below with proposed signal names. These names would be prefixed with raw_ or smoothed_ as appropriate, and would have a w at the beginning if weighted (to match our earlier signals). For example, we might have smoothed_wtravel_outside_state.

For reference, wave 3 (current) survey text is here; wave 4 (upcoming) survey text is here.

Mobility

Behavior and beliefs

Testing

(All of these are introduced in wave 4.)

@RoniRos Feedback on naming (and signal definitions) would be appreciated. For your convenience, here's a table of all the names in Markdown table syntax; just copy and paste this into a GitHub comment, stick your suggested name in the second column, and you'll get a nicely formatted table of name suggestions.

| Draft name | Suggested name |
| --- | --- |
| travel_outside_state | PUT NAME HERE |
| avoiding_contact | |
| work_outside_home_5d | |
| work_outside_home_1d | |
| wearing_mask | |
| worried | |
| sought_care | |
| tested | |
| tested_positive | |
| ever_tested_positive | |
| wanted_test | |
capnrefsmmat commented 4 years ago

Also @krivard there's the question of quality checks for these signals. I can define the signals in the code easily enough, but I'll need help to map and graph and check them out when they're in wip_ form. Will there be someone available for this, or can we pull someone from modeling?

Note that not all signals are expected to have good correlations with cases. For example, I have no idea if sought_care will correlate with cases and how, but it is of independent epidemiological interest as long as it's not completely constant in space and time. If there are trends, they help understand something, even if they're not predictive of cases.

RoniRos commented 4 years ago
Looks very good! Below are a few small suggested changes: Draft name Suggested name
travel_outside_state travel_outside_state_5d
avoiding_contact
work_outside_home_5d
work_outside_home_1d
wearing_mask
worried worried_become_ill
sought_care
tested tested_14d
tested_positive tested_positive_14d
ever_tested_positive tested_positive_ever
wanted_test wanted_test_14d

A few additional comments:

Lastly, a more open-ended question: What do you think about making all 'obvious' summary statistics (e.g. all marginal counts of all possible answers, possibly with some threshold-based censoring) available via the API? There will be many dozens of them, so we could give them automatically generated signal names. Or, to avoid overloading the DB, you can make them publicly available in daily-updated csv files (or are you doing that already?).

krivard commented 4 years ago

@capnrefsmmat The correlations app will work on CSV files, so anything that can be run against cases and deaths shouldn't need any other infrastructural assistance. For statistical checks that don't involve correlating against an existing API signal, we probably should pull in someone from Modeling.

For mapping, if you need someone to post the wip CSVs to the production server, we can definitely help with that. If you don't need them urgently, this could also be a good guinea pig scenario for uploading to the private COVIDcast server and testing out the alternate-endpoint configuration options.

capnrefsmmat commented 4 years ago

Implementation checklist:

The ones checked so far I have draft code for. I suggest that once we have the first "real" wave 4 output tomorrow, I can get the raw response data and test the code on these, so that Katie can switch to using code that outputs these indicators in a few days.

One question: For some of these, the weighted indicator name will be pretty goofy, e.g. smoothed_wwearing_mask. Should we stick with that convention or change it?

capnrefsmmat commented 4 years ago

@RoniRos I'd like to add mental health indicators to the list above. Specifically

There are definite signals here that could be of great interest.

As before, feel free to copy/paste for name suggestions:

| Draft name | Suggested name |
| --- | --- |
| anxious_5d | suggestion goes here |
| depressed_5d | |
| isolated_5d | |
RoniRos commented 4 years ago

@capnrefsmmat I thought I had posted a reply, but I don't see it now. Let me try again. Overall this looks good to me. The only change I suggest is the last item, because 'isolated' is an objective status, which also has epidemiological interpretation, whereas the question is about a perception.

Draft name Suggested name
anxious_5d anxious_5d
depressed_5d depressed_5d
isolated_5d felt_isolated_5d

Alternatively it could be 'feel_isolated_5d' or 'feeling_isolated_5d'.

capnrefsmmat commented 3 years ago

I'm going to split this into separate issues for the remaining indicators, and close this issue.