ScottishCovidResponse / SCRCIssueTracking

Central issue tracking repository for all repos in the consortium
6 stars 0 forks source link

Write scripts to extract data from source (extractors) for upload to pipeline #458

Closed bobturneruk closed 4 years ago

bobturneruk commented 4 years ago

Some of the input data is transformed from it's source prior to being used by simple network sim. These transformations should be recorded somewhere, or implemented in our code.

github-actions[bot] commented 4 years ago

Heads up @magicicada @bobturneruk @aflag @WPettersson @alex-konovalov @may1066 @mrow84 - the "Simple Network Sim" label was applied to this issue.

magicicada commented 4 years ago

As an example, I've hunted down and listed the steps taken to generate the more realistic movement multipliers that Fedor and I have been using, as well as listing the steps for what I think would be a slightly improved version.

Current Processing for Mobility Multiplier:

Steps Fetch https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv Subset country_region_code GB Subset sub_region_1 on following list Aberdeen City Aberdeenshire Angus Council Argyll and Bute Council Clackmannanshire Dumfries and Galloway Dundee City Council East Ayrshire Council East Dunbartonshire Council East Lothian Council East Renfrewshire Council Edinburgh Falkirk Fife Glasgow City Highland Council Inverclyde Midlothian Moray Na h-Eileanan an Iar North Ayrshire Council North Lanarkshire Orkney Perth and Kinross Renfrewshire Scottish Borders Shetland Islands Stirling West Dunbartonshire Council West Lothian

For each date, for each column in [retail_and_recreation_percent_change_from_baseline, transit_stations_percent_change_from_baseline, workplaces_percent_change_from_baseline], take mean of values in these columns over all regions. (Should then have one number per date)

For each date, change the number from a percentage change to a movement multiplier: move_mult = 1.0 + percentage_change/100.0

Given a particular start date for a simulation, renumber the dates as number of days after that start date.

Suggested Future Processing for Mobility Multiplier:

Improvements: weighted mean by population of region, possible to include all UK

Steps Fetch https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv

Subset country_region_code GB 2a. For a Scotland-specific run: subset iso_3166_2_code on following ISO list (note Na h-Eileanan an Iar is left out, no ISO code ) GB-ABE GB-ABD GB-ANS GB-AGB GB-CLK GB-DGY GB-DND GB-EAY GB-EDU GB-ELN GB-ERW GB-EDH GB-FAL GB-FIF GB-GLG GB-HLD GB-IVC GB-MLN GB-MRY GB-NAY GB-NLK GB-ORK GB-PKN GB-RFW GB-SCB GB-ZET GB-STG GB-WDU GB-WLN

For each ISO-3166-2 code present, fetch population (I don’t yet have a source for this)

For each date, for each column in [retail_and_recreation_percent_change_from_baseline, transit_stations_percent_change_from_baseline, workplaces_percent_change_from_baseline], take mean of values in these columns over all regions, weighted by population. (Should then have one number per date)

For each date, change the number from a percentage change to a movement multiplier: move_mult = 1.0 + percentage_change/100.0

Previously, I've done this all with a series of sed/tr/grep/multiple files/etc - obviously not ideal and we need to encode this better for data pipeline use. Planning to discuss in meeting on 22nd June.