Examine Facebook survey for possible new sensors

krivard commented 4 years ago

This paper says loss of smell or taste are excellent predictors; we should make a sensor out of the data we're getting from the facebook survey on that (B2==13)

capnrefsmmat commented 4 years ago

Other things on the Facebook survey:

We ask how many people the respondent knows in their local community with symptoms. The current sensors don't use this number, just whether it is 0.
There are additional combinations of the features we already examine: https://github.com/cmu-delphi/covidcast-indicators/issues/2#issuecomment-628807107
We ask about mobility: "To what extent are you intentionally avoiding contact with other people?:
And there are many, many other questions that could be relevant features, either to map or for forecasting.

We should probably arrange for someone to examine many of these and identify the most useful signals, in collaboration with someone from forecasting who knows what information they need.

krivard commented 4 years ago

For @eujing : an EDA exploring other questions in the survey to see if they are correlated or not with cases.

@krivard to send over the last week of survey data & a copy of the survey questions

(does not have to be in R, can be in python. geo aggregation in R is already done, so you might do those aggregations in R and then export a file for EDA in python, whatever's easy)

capnrefsmmat commented 4 years ago

List of aggregate ideas from current survey wave:

Mobility indicator.

From upcoming revision:

Percentage reporting they use masks most or all of the time.
Percentage reporting they worked outside their home.
Tests broken down by reason. We'll ask why they were tested (felt sick, employer requires tests, had other medical procedure and hospital tests everyone, ...) and can then extract out the fraction of tests that are due to illness, rather than routine testing. (This will be important as reopening happens and lots of tests are due to employer or school surveillance programs; it will put the test positivity rate in context.)
Test positivity rates based on survey-reported tests. We can extract out test positivity by test type too, provided there is enough data.
Percentage reporting spending time with others. Useful for spread but also people studying isolation.
Cumulative fraction reporting they have tested positive. (Probably only need to report this smoothed, not raw; it's not expected to change quickly.) Presumably interacts with all other features, in that if the fraction is high, case counts won't increase as fast.
Fraction with CLI who say they did (or did not) seek any medical care, or could not receive care. Useful for understanding if case counts or doctor visits may undercount. (We might expect that during a peak, cases spike but the fraction of people actually seeking care or getting tested drops, suggesting many people are going untested.)

capnrefsmmat commented 4 years ago

Another testing aggregate to add: Percentage of people reporting they wanted to be tested in the past 14 days but were not. This will be a good measure of access to testing, if the sample size is reasonable.

capnrefsmmat commented 4 years ago

Time to get more specific about potential new sensors from the symptom survey. I've written them below with proposed signal names. These names would be prefixed with raw_ or smoothed_ as appropriate, and would have a w at the beginning if weighted (to match our earlier signals). For example, we might have smoothed_wtravel_outside_state.

For reference, wave 3 (current) survey text is here; wave 4 (upcoming) survey text is here.

Mobility

travel_outside_state: Percent answering yes to item C6, "In the past 5 days, have you traveled outside of your state?"
avoiding_contact: Percent answering all or most of the time to item C7, "To what extent are you intentionally avoiding contact with other people?" (Note: This item is removed in wave 4, but present in waves 1-3, so the historical data need only be calculated once.)
work_outside_home_5d: Percent answering yes to item C3, "In the past 5 days, have you gone to work outside of your home?" (Removed in wave 4. Historical data need only be calculated once.)
work_outside_home_1d: Percent selecting "Gone to work or school outside the place where you are currently staying" to item C13a, "In the last 24 hours, have you done any of the following? Please select all that apply." (Added in wave 4. Should only include in 7-day average form, since the daily counts will have huge weekday effects.)

Behavior and beliefs

wearing_mask: Percent answering all or most of the time to item C14, "In the past 5 days, how often did you wear a mask when in public?" (Introduced in wave 4)
worried: Percent answering very or somewhat worried to item C9, "How do you feel about the possibility that you or someone in your immediate family might become seriously ill from COVID-19 (coronavirus disease)?"
sought_care: Of those who report at least one symptom in item B2c, "Which symptoms are new or unusual for you?", percent who specify a type of medical care sought in item B7, "Have you sought medical care for your recent unusual symptoms?" (Introduced in wave 4; interesting because it might fluctuate depending on current perceptions, and because if it varies in time and space, it will tell us how to scale the doctor's visits signal)

Testing

(All of these are introduced in wave 4.)

tested: Percent answering "yes" to item B10, "Have you been tested for coronavirus (COVID-19) in the last 14 days?".
tested_positive: Percent answering "yes" to item B10a, "Did this test find that you had coronavirus (COVID-19)?"
ever_tested_positive: Percent answering "yes" to either item B10a or item B11, "Have you ever tested positive for coronavirus (COVID-19)?"
wanted_test: Percent answering "yes" to item B12, "Have you wanted to be tested for coronavirus (COVID-19) at any time in the last 14 days?" B12 is only presented if the respondent was not tested in the last 14 days, so this measures people who wanted to be tested but were not.
We should have an aggregate based on item B10b, asking why the respondent was tested, but I'm not sure which reasons we want to report percentages for...

@RoniRos Feedback on naming (and signal definitions) would be appreciated. For your convenience, here's a table of all the names in Markdown table syntax; just copy and paste this into a GitHub comment, stick your suggested name in the second column, and you'll get a nicely formatted table of name suggestions.

| Draft name | Suggested name |
| --- | --- |
| travel_outside_state | PUT NAME HERE |
| avoiding_contact | |
| work_outside_home_5d | |
| work_outside_home_1d | |
| wearing_mask | |
| worried | |
| sought_care | |
| tested | |
| tested_positive | |
| ever_tested_positive | |
| wanted_test | |

capnrefsmmat commented 4 years ago

Also @krivard there's the question of quality checks for these signals. I can define the signals in the code easily enough, but I'll need help to map and graph and check them out when they're in wip_ form. Will there be someone available for this, or can we pull someone from modeling?

Note that not all signals are expected to have good correlations with cases. For example, I have no idea if sought_care will correlate with cases and how, but it is of independent epidemiological interest as long as it's not completely constant in space and time. If there are trends, they help understand something, even if they're not predictive of cases.

RoniRos commented 4 years ago

Looks very good! Below are a few small suggested changes:	Draft name	Suggested name
travel_outside_state	travel_outside_state_5d
avoiding_contact
work_outside_home_5d
work_outside_home_1d
wearing_mask
worried	worried_become_ill
sought_care
tested	tested_14d
tested_positive	tested_positive_14d
ever_tested_positive	tested_positive_ever
wanted_test	wanted_test_14d

A few additional comments:

To be consistent, 'wearning_mask' could be 'wearing_mask_5d', but I don't think we need to be consistent.
why not also add a 'tested_ever' signal?
There will be a small but growing number of people who have had an antibody test. How do we want to handle them? For the 'ever' condition, we probably want to include them, because they do point a past infection. I'm not sure what to do about the 14d condition, but hopefully there won't be many of those.

Lastly, a more open-ended question: What do you think about making all 'obvious' summary statistics (e.g. all marginal counts of all possible answers, possibly with some threshold-based censoring) available via the API? There will be many dozens of them, so we could give them automatically generated signal names. Or, to avoid overloading the DB, you can make them publicly available in daily-updated csv files (or are you doing that already?).

krivard commented 4 years ago

@capnrefsmmat The correlations app will work on CSV files, so anything that can be run against cases and deaths shouldn't need any other infrastructural assistance. For statistical checks that don't involve correlating against an existing API signal, we probably should pull in someone from Modeling.

For mapping, if you need someone to post the wip CSVs to the production server, we can definitely help with that. If you don't need them urgently, this could also be a good guinea pig scenario for uploading to the private COVIDcast server and testing out the alternate-endpoint configuration options.

capnrefsmmat commented 4 years ago

Implementation checklist:

[x] travel_outside_state_5d
[ ] avoiding_contact
[x] work_outside_home_5d
[ ] work_outside_home_1d
[x] wearing_mask
[x] worried_become_ill
[ ] sought_care
[x] tested_14d
[x] tested_positive_14d
[x] tested_positive_ever
[x] wanted_test_14d

The ones checked so far I have draft code for. I suggest that once we have the first "real" wave 4 output tomorrow, I can get the raw response data and test the code on these, so that Katie can switch to using code that outputs these indicators in a few days.

One question: For some of these, the weighted indicator name will be pretty goofy, e.g. smoothed_wwearing_mask. Should we stick with that convention or change it?

capnrefsmmat commented 4 years ago

@RoniRos I'd like to add mental health indicators to the list above. Specifically

anxious_5d: Percent of people who report being nervous/anxious most or all of the time in the past 5 days
depressed_5d: Percent of people who report being depressed most or all of the time in the past 5 days
isolated_5d: Percent of people who report feeling isolated from others most or all of the time in the past 5 days

There are definite signals here that could be of great interest.

As before, feel free to copy/paste for name suggestions:

| Draft name | Suggested name |
| --- | --- |
| anxious_5d | suggestion goes here |
| depressed_5d | |
| isolated_5d | |

RoniRos commented 4 years ago

@capnrefsmmat I thought I had posted a reply, but I don't see it now. Let me try again. Overall this looks good to me. The only change I suggest is the last item, because 'isolated' is an objective status, which also has epidemiological interpretation, whereas the question is about a perception.

Draft name	Suggested name
anxious_5d	anxious_5d
depressed_5d	depressed_5d
isolated_5d	felt_isolated_5d

Alternatively it could be 'feel_isolated_5d' or 'feeling_isolated_5d'.

capnrefsmmat commented 3 years ago

I'm going to split this into separate issues for the remaining indicators, and close this issue.

cmu-delphi / covidcast-indicators