Closed krivard closed 3 years ago
Other things on the Facebook survey:
We should probably arrange for someone to examine many of these and identify the most useful signals, in collaboration with someone from forecasting who knows what information they need.
For @eujing : an EDA exploring other questions in the survey to see if they are correlated or not with cases.
@krivard to send over the last week of survey data & a copy of the survey questions
(does not have to be in R, can be in python. geo aggregation in R is already done, so you might do those aggregations in R and then export a file for EDA in python, whatever's easy)
List of aggregate ideas from current survey wave:
From upcoming revision:
Another testing aggregate to add: Percentage of people reporting they wanted to be tested in the past 14 days but were not. This will be a good measure of access to testing, if the sample size is reasonable.
Time to get more specific about potential new sensors from the symptom survey. I've written them below with proposed signal names. These names would be prefixed with raw_
or smoothed_
as appropriate, and would have a w
at the beginning if weighted (to match our earlier signals). For example, we might have smoothed_wtravel_outside_state
.
For reference, wave 3 (current) survey text is here; wave 4 (upcoming) survey text is here.
travel_outside_state
: Percent answering yes to item C6, "In the past 5 days, have you traveled outside of your state?"avoiding_contact
: Percent answering all or most of the time to item C7, "To what extent are you intentionally avoiding contact with other people?" (Note: This item is removed in wave 4, but present in waves 1-3, so the historical data need only be calculated once.)work_outside_home_5d
: Percent answering yes to item C3, "In the past 5 days, have you gone to work outside of your home?" (Removed in wave 4. Historical data need only be calculated once.)work_outside_home_1d
: Percent selecting "Gone to work or school outside the place where you are currently staying" to item C13a, "In the last 24 hours, have you done any of the following? Please select all that apply." (Added in wave 4. Should only include in 7-day average form, since the daily counts will have huge weekday effects.)wearing_mask
: Percent answering all or most of the time to item C14, "In the past 5 days, how often did you wear a mask when in public?" (Introduced in wave 4)worried
: Percent answering very or somewhat worried to item C9, "How do you feel about the possibility that you or someone in your immediate family might become seriously ill from COVID-19 (coronavirus disease)?"sought_care
: Of those who report at least one symptom in item B2c, "Which symptoms are new or unusual for you?", percent who specify a type of medical care sought in item B7, "Have you sought medical care for your recent unusual symptoms?" (Introduced in wave 4; interesting because it might fluctuate depending on current perceptions, and because if it varies in time and space, it will tell us how to scale the doctor's visits signal)(All of these are introduced in wave 4.)
tested
: Percent answering "yes" to item B10, "Have you been tested for coronavirus (COVID-19) in the last 14 days?".tested_positive
: Percent answering "yes" to item B10a, "Did this test find that you had coronavirus (COVID-19)?"ever_tested_positive
: Percent answering "yes" to either item B10a or item B11, "Have you ever tested positive for coronavirus (COVID-19)?"wanted_test
: Percent answering "yes" to item B12, "Have you wanted to be tested for coronavirus (COVID-19) at any time in the last 14 days?" B12 is only presented if the respondent was not tested in the last 14 days, so this measures people who wanted to be tested but were not.@RoniRos Feedback on naming (and signal definitions) would be appreciated. For your convenience, here's a table of all the names in Markdown table syntax; just copy and paste this into a GitHub comment, stick your suggested name in the second column, and you'll get a nicely formatted table of name suggestions.
| Draft name | Suggested name |
| --- | --- |
| travel_outside_state | PUT NAME HERE |
| avoiding_contact | |
| work_outside_home_5d | |
| work_outside_home_1d | |
| wearing_mask | |
| worried | |
| sought_care | |
| tested | |
| tested_positive | |
| ever_tested_positive | |
| wanted_test | |
Also @krivard there's the question of quality checks for these signals. I can define the signals in the code easily enough, but I'll need help to map and graph and check them out when they're in wip_
form. Will there be someone available for this, or can we pull someone from modeling?
Note that not all signals are expected to have good correlations with cases. For example, I have no idea if sought_care
will correlate with cases and how, but it is of independent epidemiological interest as long as it's not completely constant in space and time. If there are trends, they help understand something, even if they're not predictive of cases.
Looks very good! Below are a few small suggested changes: | Draft name | Suggested name |
---|---|---|
travel_outside_state | travel_outside_state_5d | |
avoiding_contact | ||
work_outside_home_5d | ||
work_outside_home_1d | ||
wearing_mask | ||
worried | worried_become_ill | |
sought_care | ||
tested | tested_14d | |
tested_positive | tested_positive_14d | |
ever_tested_positive | tested_positive_ever | |
wanted_test | wanted_test_14d |
A few additional comments:
Lastly, a more open-ended question: What do you think about making all 'obvious' summary statistics (e.g. all marginal counts of all possible answers, possibly with some threshold-based censoring) available via the API? There will be many dozens of them, so we could give them automatically generated signal names. Or, to avoid overloading the DB, you can make them publicly available in daily-updated csv files (or are you doing that already?).
@capnrefsmmat The correlations app will work on CSV files, so anything that can be run against cases and deaths shouldn't need any other infrastructural assistance. For statistical checks that don't involve correlating against an existing API signal, we probably should pull in someone from Modeling.
For mapping, if you need someone to post the wip CSVs to the production server, we can definitely help with that. If you don't need them urgently, this could also be a good guinea pig scenario for uploading to the private COVIDcast server and testing out the alternate-endpoint configuration options.
Implementation checklist:
The ones checked so far I have draft code for. I suggest that once we have the first "real" wave 4 output tomorrow, I can get the raw response data and test the code on these, so that Katie can switch to using code that outputs these indicators in a few days.
One question: For some of these, the weighted indicator name will be pretty goofy, e.g. smoothed_wwearing_mask
. Should we stick with that convention or change it?
@RoniRos I'd like to add mental health indicators to the list above. Specifically
anxious_5d
: Percent of people who report being nervous/anxious most or all of the time in the past 5 daysdepressed_5d
: Percent of people who report being depressed most or all of the time in the past 5 daysisolated_5d
: Percent of people who report feeling isolated from others most or all of the time in the past 5 daysThere are definite signals here that could be of great interest.
As before, feel free to copy/paste for name suggestions:
| Draft name | Suggested name |
| --- | --- |
| anxious_5d | suggestion goes here |
| depressed_5d | |
| isolated_5d | |
@capnrefsmmat I thought I had posted a reply, but I don't see it now. Let me try again. Overall this looks good to me. The only change I suggest is the last item, because 'isolated' is an objective status, which also has epidemiological interpretation, whereas the question is about a perception.
Draft name | Suggested name |
---|---|
anxious_5d | anxious_5d |
depressed_5d | depressed_5d |
isolated_5d | felt_isolated_5d |
Alternatively it could be 'feel_isolated_5d' or 'feeling_isolated_5d'.
I'm going to split this into separate issues for the remaining indicators, and close this issue.
This paper says loss of smell or taste are excellent predictors; we should make a sensor out of the data we're getting from the facebook survey on that (
B2==13
)