pct_area_desire_lines function in or out?

layik commented 5 years ago

#' Desire lines
#'
#' @export
pct_area_desire_lines = function(area = "sheffield", top_n = 100) {
  if(!exists("area"))
    stop("area is required.")
  if(length(area) != 1L)
    stop("'package' must be of length 1")
  if(is.na(area) || (area == "") || !is.character(area))
    stop("invalid area name")
  census_file = file.path(tempdir(), "wu03ew_v2.csv")
  if(!exists(census_file)) {
    download.file("https://s3-eu-west-1.amazonaws.com/statistics.digitalresources.jisc.ac.uk/dkan/files/FLOW/wu03ew_v2/wu03ew_v2.zip",
                  file.path(tempdir(), "wu03ew_v2.zip"))
    unzip(file.path(tempdir(), "wu03ew_v2.zip"), exdir = tempdir())
  }
  od_all = read_csv(census_file)
  zones = ukboundaries::msoa2011_vsimple[
    grepl(area, ukboundaries::msoa2011_vsimple$msoa11nm,
          ignore.case = T), ]

  od_area = od_all[od_all$`Area of residence` %in% zones$msoa11cd &
                     od_all$`Area of workplace` %in% zones$msoa11cd, ]
  od_area = od_area[od_area$`Area of residence` !=
                      od_area$`Area of workplace`, ]
  od_area = od_area[order(od_area$`All categories: Method of travel to work`,
                decreasing = TRUE),][1:top_n,]
  area_desire_lines = stplanr::od2line(
    flow = od_area[,c(2,1)], zones[,2])

  area_desire_lines
}

In base r but requires ukboundaries and stplanr.

Thoughts @Robinlovelace. I can send a PR too.

Robinlovelace commented 5 years ago

I think we should write a function to 🚿 the names.

E.g.:

names_new[3] = "all"
names(d) = names_new
names(d)
# [1] "area_of_residence"       "area_of_workplace"      
# [3] "all" "work_mainly_at_or_from_home"      
# [5] "underground_metro_light_rail_tram" "train"        
# [7] "bus_minibus_or_coach"    "taxi"         
# [9] "motorcycle_scooter_or_moped"       "driving_a_car_or_van"   
# [11] "passenger_in_a_car_or_van"         "bicycle"      
# [13] "on_foot"       "other_method_of_travel_to_work"

layik commented 5 years ago

I know, I was going...please use Robin's code its already there, but ...what can I say :)

Robinlovelace commented 5 years ago

I like the code. I just dislike the column names that were provided by DfT. Like with stats19 I suggest we impose our own 'good' column names on them at the outset. The earlier we clean the names (e.g. with mode_names_clean() the better.

Robinlovelace commented 5 years ago

Names in the pct: https://github.com/npct/pct-shiny/blob/master/regions_www/www/static/02_codebooks/commute/od_l_rf_codebook.csv

Robinlovelace commented 5 years ago

We can just use that...

Robinlovelace commented 5 years ago

And, to be fair, just hard-coding them would be fine.

layik commented 5 years ago

Right. Done deal, can we do it without tidyverse? Let me send the PR as I am not sure why it cannot find the new awesome get_centroid function.

layik commented 5 years ago

"faithful to the data" :)

layik commented 5 years ago

btw

get_centroids_ew = function() {
  u = "https://opendata.arcgis.com/datasets/b0a6d8a3dc5d4718b3fd62c548d60f81_0.csv"
  pwc = readr::read_csv(u)
  sf::st_as_sf(x = pwc[c("X", "Y", "msoa11cd", "msoa11nm")], coords = c("X", "Y"), crs = 4326)
}
zones_all = get_centroids_ew()
#> Parsed with column specification:
#> cols(
#>   X = col_double(),
#>   Y = col_double(),
#>   objectid = col_double(),
#>   msoa11cd = col_character(),
#>   msoa11nm = col_character()
#> )
pryr::object_size(zones_all)
#> 2 MB

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

Robinlovelace commented 5 years ago

I think we just save a subset for Leeds. Keep the pkg data minimal - hence using 10 not 100 desire lines for Leeds.

Robinlovelace commented 5 years ago

Plus we can always create a supporting pctdata pkg.

Robinlovelace commented 5 years ago

Yes. The readr may be the only tidyverse pkg we use.

Robinlovelace commented 5 years ago

Job. Done.

ITSLeeds / pct

pct_area_desire_lines function in or out? #7