Restructures data and updates pipeline for new format.
Data is now stored in strictly unsummarized and summarized formats (see model files for schema). Spot data and predictions are stored in the same row of data as the summarized data. This simplifies the system overall and allows for fewer errors.
New data is sent in via a webhook from Survey123 to our /survey123/webhook route. It is then cast to the unsummarized schema and stored in the unsummarized collection. The full pipeline is run whenever data is uploaded (either via CSV or the webhook). By default, it only operates on data from 2021 and beyond, freezing all pre-2021 data as is. The pipeline takes the unsummarized data and combines trap data to generate a single row on a state/year/county or RD level. It then generates model input variables for cleridst1, spotst1, and spotst2 by making passes through the data on year - 1 and year - 2. It then generates predictions, output, and indicator variables for each row of data.
To Do
There are some minor discrepancies between our generated data and the data the partners provided. This may be fixed in follow-up PRs.
Description
Restructures data and updates pipeline for new format.
Data is now stored in strictly unsummarized and summarized formats (see model files for schema). Spot data and predictions are stored in the same row of data as the summarized data. This simplifies the system overall and allows for fewer errors.
New data is sent in via a webhook from Survey123 to our
/survey123/webhook
route. It is then cast to the unsummarized schema and stored in the unsummarized collection. The full pipeline is run whenever data is uploaded (either via CSV or the webhook). By default, it only operates on data from 2021 and beyond, freezing all pre-2021 data as is. The pipeline takes the unsummarized data and combines trap data to generate a single row on a state/year/county or RD level. It then generates model input variables forcleridst1
,spotst1
, andspotst2
by making passes through the data on year - 1 and year - 2. It then generates predictions, output, and indicator variables for each row of data.To Do
There are some minor discrepancies between our generated data and the data the partners provided. This may be fixed in follow-up PRs.
Type of Change
Checklist