Uptake model - Githubissues

Robinlovelace commented 4 years ago

With something like:

# m = glm(formula = pcycle ~ distance + sqrt(distance) + gradient + circuity_cycling + road_speeds1,
#     family = quasibinomial)

joeytalbot commented 4 years ago

For this model, I suggest using the new development OAs we have identified to extract the proportion of people who walk/cycle, but using the LSOAs they sit within for the travel to work routing. Does that sound workable?

Robinlovelace commented 4 years ago

That sounds like a good plan, although there is OA-WPZ data available: https://www.nomisweb.co.uk/census/2011/wf02ew

joeytalbot commented 4 years ago

I think the amount of routing that would require is too much. Is there a simple way of looking up which LSOA each OA is in?

Robinlovelace commented 4 years ago

I think the amount of routing that would require is too much. Is there a simple way of looking up which LSOA each OA is in?

Good point, and with the LSOA - LSOA level analysis we can use the PCT routes as a quick proxy. I suggest:

sf::st_join(oa, lsoa)

To add LSOA variables to OA data.

mem48 commented 4 years ago

There are oa to lsoa lookup tables on open geography portal

Robinlovelace commented 4 years ago

Another dataset we can use: https://data.london.gov.uk/dataset/property-build-period-lsoa

joeytalbot commented 4 years ago

This dataset looks amazing! @Robinlovelace there really are data on age of housing stock for LSOAs across the country, not just in London. For London itself there's also data comparing housing stock age to council tax band, and data on dwelling type and number of bedrooms.

Robinlovelace commented 4 years ago

Great data exploring @joeytalbot I plan to start a script that will read them in, who knows could eventually be an ACTON function!

joeytalbot commented 4 years ago

It certainly seems like it would make an appropriate function!

Robinlovelace commented 4 years ago

Heads-up @joeytalbot I've added a script that gets the data and deals with the pesky commas. Please take a look and work from that. Downing tools on this project for the day :hammer: .

joeytalbot commented 4 years ago

Ok, so with that data there are 84 LSOAs across England and Wales containing >90% homes built in the years 2000-09. These could go straight into a model (excluding 20% of them for validation purposes), or I could look at a wider selection of LSOAs (eg there are 326 with >60% of homes built 2000-09) and then pick out the OAs within them that relate to the new developments - although if the remaining homes are mostly built during the 1990s this is probably still fine.

The next step is to get OD data - what's the best source for nationwide OD data, without having to download each region separately?

Robinlovelace commented 4 years ago

MSOA data:

od_data = pct::get_od()
#> No region provided. Returning national OD data.
#> Parsed with column specification:
#> cols(
#>   `Area of residence` = col_character(),
#>   `Area of workplace` = col_character(),
#>   `All categories: Method of travel to work` = col_double(),
#>   `Work mainly at or from home` = col_double(),
#>   `Underground, metro, light rail, tram` = col_double(),
#>   Train = col_double(),
#>   `Bus, minibus or coach` = col_double(),
#>   Taxi = col_double(),
#>   `Motorcycle, scooter or moped` = col_double(),
#>   `Driving a car or van` = col_double(),
#>   `Passenger in a car or van` = col_double(),
#>   Bicycle = col_double(),
#>   `On foot` = col_double(),
#>   `Other method of travel to work` = col_double()
#> )
#> Parsed with column specification:
#> cols(
#>   MSOA11CD = col_character(),
#>   MSOA11NM = col_character(),
#>   BNGEAST = col_double(),
#>   BNGNORTH = col_double(),
#>   LONGITUDE = col_double(),
#>   LATITUDE = col_double()
#> )
od_data
#> # A tibble: 2,402,201 x 18
#>    geo_code1 geo_code2   all from_home light_rail train   bus  taxi motorbike
#>    <chr>     <chr>     <dbl>     <dbl>      <dbl> <dbl> <dbl> <dbl>     <dbl>
#>  1 E02000001 E02000001  1506         0         73    41    32     9         1
#>  2 E02000001 E02000014     2         0          2     0     0     0         0
#>  3 E02000001 E02000016     3         0          1     0     2     0         0
#>  4 E02000001 E02000025     1         0          0     1     0     0         0
#>  5 E02000001 E02000028     1         0          0     0     0     0         0
#>  6 E02000001 E02000051     1         0          1     0     0     0         0
#>  7 E02000001 E02000053     2         0          2     0     0     0         0
#>  8 E02000001 E02000057     1         0          1     0     0     0         0
#>  9 E02000001 E02000058     1         0          0     0     0     0         0
#> 10 E02000001 E02000059     1         0          0     0     0     1         0
#> # â€¦ with 2,402,191 more rows, and 9 more variables: car_driver <dbl>,
#> #   car_passenger <dbl>, bicycle <dbl>, foot <dbl>, other <dbl>,
#> #   geo_name1 <chr>, geo_name2 <chr>, la_1 <chr>, la_2 <chr>

^{Created on 2020-04-03 by the reprex package (v0.3.0)}

Robinlovelace commented 4 years ago

LSOA data:

library(sf)
u = "https://github.com/npct/pct-outputs-national/raw/master/commute/lsoa/l_all.Rds"
desire_lines_lsoa_national_sp = readRDS(url(u))
l = sf::st_as_sf(desire_lines_lsoa_national_sp)
plot(l[sample(nrow(l), size = 10000), ])

Robinlovelace commented 4 years ago

Output:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
u = "https://github.com/npct/pct-outputs-national/raw/master/commute/lsoa/l_all.Rds"
desire_lines_lsoa_national_sp = readRDS(url(u))
l = sf::st_as_sf(desire_lines_lsoa_national_sp)
#> Loading required package: sp
#> Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
#> deprecated. It might return a CRS with a non-EPSG compliant axis order.
plot(l[sample(nrow(l), size = 10000), ])
#> Warning: plotting the first 10 out of 141 attributes; use max.plot = 141 to plot
#> all

^{Created on 2020-04-03 by the reprex package (v0.3.0)}

Robinlovelace commented 4 years ago

FYI that process uses around 4GB ram:

joeytalbot commented 4 years ago

There seem to be 120 geocode2 from od_lsoas_short that didn't make it into od_lsoas_short_routes . I'm trying to work out why - these are normal geocodes within the UK. Given that all geocode1 from od_lsoas_short appear in od_lsoas_short_routes it doesn't seem like there could be problems with centroids that aren't on the road network.

> missing = od_lsoas_short[! od_lsoas_short$geocode2 %in% od_lsoas_short_routes$geocode2,]
There were 34 warnings (use warnings() to see them)
> dim(missing) # there are 1066 rows in od_lsoas_short where the geocode2 didn't make it into od_lsoas_short_routes. These are normal geocodes, within the UK
[1] 1066   16
> unique(od_lsoas_short$geocode2[! od_lsoas_short$geocode2 %in% od_lsoas_short_routes$geocode2])
  [1] "E01026997" "E01026998" "E01026999" "E01027000" "E01027002" "E01027003" "E01027004" "E01027005" "E01027006" "E01027007" "E01027009"
 [12] "E01027010" "E01027011" "E01027013" "E01027014" "E01027016" "E01027017" "E01027018" "E01027019" "E01027021" "E01027022" "E01027024"
 [23] "E01027025" "E01027026" "E01027027" "E01027028" "E01027029" "E01027030" "E01027031" "E01027032" "E01027033" "E01027034" "E01027038"
 [34] "E01027039" "E01027041" "E01027043" "E01027044" "E01027046" "E01027048" "E01027049" "E01027050" "E01027051" "E01027052" "E01027053"
 [45] "E01027054" "E01027055" "E01027056" "E01027057" "E01027059" "E01027060" "E01027062" "E01027063" "E01027064" "E01027065" "E01027067"
 [56] "E01027069" "E01027073" "E01027074" "E01027075" "E01027076" "E01027077" "E01027078" "E01027079" "E01027082" "E01027084" "E01027085"
 [67] "E01027087" "E01027089" "E01027090" "E01027091" "E01027092" "E01027094" "E01027097" "E01027098" "E01027099" "E01027100" "E01027101"
 [78] "E01031363" "E01031364" "E01031365" "E01031366" "E01031372" "E01031375" "E01031377" "E01031379" "E01031381" "E01031385" "E01031386"
 [89] "E01031388" "E01031389" "E01031390" "E01031391" "E01031392" "E01031393" "E01031394" "E01031395" "E01031396" "E01031397" "E01031398"
[100] "E01031400" "E01031402" "E01031403" "E01031409" "E01031411" "E01031413" "E01031415" "E01031417" "E01031418" "E01031419" "E01031421"
[111] "E01031424" "E01031425" "E01031426" "E01031427" "E01031429" "E01031430" "E01031431" "E01031432" "E01031434" "E01031435"
> unique(od_lsoas_short$geocode1[! od_lsoas_short$geocode1 %in% od_lsoas_short_routes$geocode1])
character(0)

Robinlovelace commented 4 years ago

Interesting finding. Have you tried plotting the geocode1 locations associated with the missing codes?

joeytalbot commented 4 years ago

missing routes

Robinlovelace commented 4 years ago

Definitely not a random distribution!

joeytalbot commented 4 years ago

The missing routes are just in two areas (Northamptonshire and West Sussex). They also seem to have (two sets of) consecutive row numbers.

> summary(missing$rownum) # are they particular rows ie a section of od_lsoas_short?
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 218266  218532  218798  232867  257128  257394

Robinlovelace commented 4 years ago

Maybe worth looking at the associated chunks here: https://github.com/cyipt/acton/releases/download/0.0.1/od_lsoa_routes_20020-04-07-chunks-of-1000-rows.zip

Robinlovelace commented 4 years ago

Heads-up @joeytalbot I've been thinking about gradients and I'm not sure our calculations were correct. It's worth doing some sense checking on them I think. To help understand the issue, and to encode the hopefully-correct way of calculating gradients, with different levels of smoothing to account for the impact of anomalous short segments, I've written some functions into stplanr to support this and ensure reproducibility and ease of use in the future.

Please take a look at these results for a single sample route. May be worth checking the results of route_rolling_gradient() with different values for n (I suggest 5 is good and will smooth out the vast majority of anomalies but we could try different values also, the default is 2, meaning it takes the average of each segment and the next one but that doesn't seem to smooth it out that much):

remotes::install_github("ropensci/stplanr")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'stplanr' from a github remote, the SHA1 (46e69e22) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(stplanr)
r1 <- od_data_routes[od_data_routes$route_number == 2, ]
y <- r1$elevations
distances <- r1$distances
route_rolling_gradient(y, distances)
#> [1] 0.009112333 0.029220779 0.106132075 0.036435786 0.047012302 0.048840049
#> [7] 0.017539244          NA
route_rolling_gradient(y, distances, abs = FALSE)
#> [1] -0.009112333 -0.029220779 -0.106132075  0.036435786  0.047012302
#> [6] -0.048840049 -0.017539244           NA
route_rolling_gradient(y, distances, n = 3)
#> [1]         NA 0.02191558 0.14966741 0.03888604 0.05635534 0.01285072 0.02380952
#> [8]         NA
route_rolling_gradient(y, distances, n = 4)
#> [1]         NA 0.02371542 0.08991009 0.05085599 0.06604938 0.01523810         NA
#> [8]         NA
r1$elevations_diff_1 <- route_rolling_diff(y, lag = 1)
r1$rolling_gradient <- route_rolling_gradient(y, distances, n = 2)
r1$rolling_gradient3 <- route_rolling_gradient(y, distances, n = 3)
r1$rolling_gradient4 <- route_rolling_gradient(y, distances, n = 4)
d <- cumsum(r1$distances) - r1$distances / 2
diff_above_mean <- r1$elevations_diff_1 + mean(y)
par(mfrow = c(2, 1))
plot(c(0, cumsum(r1$distances)), c(y, y[length(y)]), ylim = c(80, 130))
lines(c(0, cumsum(r1$distances)), c(y, y[length(y)]))
points(d, diff_above_mean )
abline(h = mean(y))
rg = r1$rolling_gradient
rg[is.na(rg)] <- 0
plot(c(0, d), c(0, rg), ylim = c(0, 0.2))
points(c(0, d), c(0, r1$rolling_gradient3), col = "blue")
points(c(0, d), c(0, r1$rolling_gradient4), col = "grey")

par(mfrow = c(1, 1))

y = od_data_routes$elevations[od_data_routes$route_number == 2]
y
#> [1] 112.5263 111.0000 106.5000  84.0000  96.6250 110.0000 107.1429 105.5556
route_rolling_average(y)
#> [1]        NA 110.00877 100.50000  95.70833  96.87500 104.58929 107.56614
#> [8]        NA
route_rolling_average(y, n = 1)
#> [1] 112.5263 111.0000 106.5000  84.0000  96.6250 110.0000 107.1429 105.5556
route_rolling_average(y, n = 2)
#> [1] 111.7632 108.7500  95.2500  90.3125 103.3125 108.5714 106.3492       NA
route_rolling_average(y, n = 3)
#> [1]        NA 110.00877 100.50000  95.70833  96.87500 104.58929 107.56614
#> [8]        NA

r1 <- od_data_routes[od_data_routes$route_number == 2, ]
elevations <- r1$elevations
distances <- r1$distances
route_average_gradient(elevations, distances) # an average of a 4% gradient
#> [1] 0.03907936

^{Created on 2020-04-16 by the reprex package (v0.3.0)}

Heads-up @temospena I think this will be useful for the Lisbon work!

Robinlovelace commented 4 years ago

Heads-up @joeytalbot this is the reprex I was trying to do before:

library(stplanr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
l = pct::get_pct_lines("west-yorkshire")

l_top = l %>% 
  top_n(n = 10, bicycle)
r = route(l = l_top, route_fun = cyclestreets::journey)
#> Most common output is sf
#> Loading required namespace: data.table
names(r)
#>   [1] "id"                      "geo_code1"              
#>   [3] "geo_code2"               "geo_name1"              
#>   [5] "geo_name2"               "lad11cd1"               
#>   [7] "lad11cd2"                "lad_name1"              
#>   [9] "lad_name2"               "all"                    
#>  [11] "bicycle"                 "foot"                   
#>  [13] "car_driver"              "car_passenger"          
#>  [15] "motorbike"               "train_tube"             
#>  [17] "bus"                     "taxi_other"             
#>  [19] "govtarget_slc"           "govtarget_sic"          
#>  [21] "govtarget_slw"           "govtarget_siw"          
#>  [23] "govtarget_sld"           "govtarget_sid"          
#>  [25] "govtarget_slp"           "govtarget_sip"          
#>  [27] "govtarget_slm"           "govtarget_sim"          
#>  [29] "govtarget_slpt"          "govtarget_sipt"         
#>  [31] "govnearmkt_slc"          "govnearmkt_sic"         
#>  [33] "govnearmkt_slw"          "govnearmkt_siw"         
#>  [35] "govnearmkt_sld"          "govnearmkt_sid"         
#>  [37] "govnearmkt_slp"          "govnearmkt_sip"         
#>  [39] "govnearmkt_slm"          "govnearmkt_sim"         
#>  [41] "govnearmkt_slpt"         "govnearmkt_sipt"        
#>  [43] "gendereq_slc"            "gendereq_sic"           
#>  [45] "gendereq_slw"            "gendereq_siw"           
#>  [47] "gendereq_sld"            "gendereq_sid"           
#>  [49] "gendereq_slp"            "gendereq_sip"           
#>  [51] "gendereq_slm"            "gendereq_sim"           
#>  [53] "gendereq_slpt"           "gendereq_sipt"          
#>  [55] "dutch_slc"               "dutch_sic"              
#>  [57] "dutch_slw"               "dutch_siw"              
#>  [59] "dutch_sld"               "dutch_sid"              
#>  [61] "dutch_slp"               "dutch_sip"              
#>  [63] "dutch_slm"               "dutch_sim"              
#>  [65] "dutch_slpt"              "dutch_sipt"             
#>  [67] "ebike_slc"               "ebike_sic"              
#>  [69] "ebike_slw"               "ebike_siw"              
#>  [71] "ebike_sld"               "ebike_sid"              
#>  [73] "ebike_slp"               "ebike_sip"              
#>  [75] "ebike_slm"               "ebike_sim"              
#>  [77] "ebike_slpt"              "ebike_sipt"             
#>  [79] "base_slcyclehours"       "govtarget_sicyclehours" 
#>  [81] "govnearmkt_sicyclehours" "gendereq_sicyclehours"  
#>  [83] "dutch_sicyclehours"      "ebike_sicyclehours"     
#>  [85] "base_sldeath"            "base_slyll"             
#>  [87] "base_slvalueyll"         "base_slsickdays"        
#>  [89] "base_slvaluesick"        "base_slvaluecomb"       
#>  [91] "govtarget_sideath"       "govtarget_siyll"        
#>  [93] "govtarget_sivalueyll"    "govtarget_sisickdays"   
#>  [95] "govtarget_sivaluesick"   "govtarget_sivaluecomb"  
#>  [97] "govnearmkt_sideath"      "govnearmkt_siyll"       
#>  [99] "govnearmkt_sivalueyll"   "govnearmkt_sisickdays"  
#> [101] "govnearmkt_sivaluesick"  "govnearmkt_sivaluecomb" 
#> [103] "gendereq_sideath"        "gendereq_siyll"         
#> [105] "gendereq_sivalueyll"     "gendereq_sisickdays"    
#> [107] "gendereq_sivaluesick"    "gendereq_sivaluecomb"   
#> [109] "dutch_sideath"           "dutch_siyll"            
#> [111] "dutch_sivalueyll"        "dutch_sisickdays"       
#> [113] "dutch_sivaluesick"       "dutch_sivaluecomb"      
#> [115] "ebike_sideath"           "ebike_siyll"            
#> [117] "ebike_sivalueyll"        "ebike_sisickdays"       
#> [119] "ebike_sivaluesick"       "ebike_sivaluecomb"      
#> [121] "base_slcarkm"            "base_slco2"             
#> [123] "govtarget_sicarkm"       "govtarget_sico2"        
#> [125] "govnearmkt_sicarkm"      "govnearmkt_sico2"       
#> [127] "gendereq_sicarkm"        "gendereq_sico2"         
#> [129] "dutch_sicarkm"           "dutch_sico2"            
#> [131] "ebike_sicarkm"           "ebike_sico2"            
#> [133] "e_dist_km"               "rf_dist_km"             
#> [135] "rq_dist_km"              "dist_rf_e"              
#> [137] "dist_rq_rf"              "rf_avslope_perc"        
#> [139] "rq_avslope_perc"         "rf_time_min"            
#> [141] "rq_time_min"             "route_number"           
#> [143] "name"                    "distances"              
#> [145] "time"                    "busynance"              
#> [147] "elevations"              "start_longitude"        
#> [149] "start_latitude"          "finish_longitude"       
#> [151] "finish_latitude"         "gradient_segment"       
#> [153] "provisionName"           "quietness"              
#> [155] "geometry"

^{Created on 2020-04-21 by the reprex package (v0.3.0)}

cyipt / acton

Uptake model #59