ITSLeeds / pct

Get and reproduce data from the Propensity to Cycle Tool (PCT)
https://itsleeds.github.io/pct/
19 stars 10 forks source link

pct_area_desire_lines function in or out? #7

Closed layik closed 5 years ago

layik commented 5 years ago
#' Desire lines
#'
#' @export
pct_area_desire_lines = function(area = "sheffield", top_n = 100) {
  if(!exists("area"))
    stop("area is required.")
  if(length(area) != 1L)
    stop("'package' must be of length 1")
  if(is.na(area) || (area == "") || !is.character(area))
    stop("invalid area name")
  census_file = file.path(tempdir(), "wu03ew_v2.csv")
  if(!exists(census_file)) {
    download.file("https://s3-eu-west-1.amazonaws.com/statistics.digitalresources.jisc.ac.uk/dkan/files/FLOW/wu03ew_v2/wu03ew_v2.zip",
                  file.path(tempdir(), "wu03ew_v2.zip"))
    unzip(file.path(tempdir(), "wu03ew_v2.zip"), exdir = tempdir())
  }
  od_all = read_csv(census_file)
  zones = ukboundaries::msoa2011_vsimple[
    grepl(area, ukboundaries::msoa2011_vsimple$msoa11nm,
          ignore.case = T), ]

  od_area = od_all[od_all$`Area of residence` %in% zones$msoa11cd &
                     od_all$`Area of workplace` %in% zones$msoa11cd, ]
  od_area = od_area[od_area$`Area of residence` !=
                      od_area$`Area of workplace`, ]
  od_area = od_area[order(od_area$`All categories: Method of travel to work`,
                decreasing = TRUE),][1:top_n,]
  area_desire_lines = stplanr::od2line(
    flow = od_area[,c(2,1)], zones[,2])

  area_desire_lines
}

In base r but requires ukboundaries and stplanr.

Thoughts @Robinlovelace. I can send a PR too.

Robinlovelace commented 5 years ago

In for sure, this is fantastic work @layik and great work on identifying that endpoint!

layik commented 5 years ago

So just to be clear, this is like:

  1. Where do you want to look? > "Sheffield"
  2. Get you top x number of most travelled OD
  3. Generate desire lines for (2)

now go away and do your uptake for these?

layik commented 5 years ago

Would ukboundaries cause an issue? Can we get it into pct??

Robinlovelace commented 5 years ago

BTW the lines that this function pulls in are more detailed that the lines in the PCT because the PCT simplifying things by using this function to convert 2 lines between the same zones into a single line:

https://www.rdocumentation.org/packages/stplanr/versions/0.0.2/topics/onewayid

layik commented 5 years ago
> object.size(ukboundaries::cas2003_vsimple)
9857296 bytes
> 9857296/1024/1024
[1] 9.40065
Robinlovelace commented 5 years ago

Yes just take the vital and small data from ukboundaries and put it in there is my thinking...

layik commented 5 years ago

Right, will do!!

Robinlovelace commented 5 years ago

But wait...

layik commented 5 years ago

Waiting... can be done tomorrow!

Robinlovelace commented 5 years ago

ukboundaries::msoa2011_vsimple that's the one we want, right?

Robinlovelace commented 5 years ago

That will be useful for sure.

layik commented 5 years ago
zones = ukboundaries::msoa2011_vsimple[
    grepl(area, ukboundaries::msoa2011_vsimple$msoa11nm,
          ignore.case = T), ]
layik commented 5 years ago

OK would be renamed pct_area_desire_lines_msoa or similar.

Robinlovelace commented 5 years ago

But you said:

> object.size(ukboundaries::cas2003_vsimple)
layik commented 5 years ago

oh, just my bad :) try again

> object.size(ukboundaries::msoa2011_vsimple)
10616416 bytes
> 10616416/1024/1024
[1] 10.1246
Robinlovelace commented 5 years ago

In other words, it's too big:

pryr::object_size(ukboundaries::msoa2011_vsimple)
#> Using default data cache directory ~/.ukboundaries/cache 
#> Use cache_dir() to change it.
#> 8.04 MB

Created on 2019-03-07 by the reprex package (v0.2.1)

layik commented 5 years ago

or whichever does this job, you know these objects much much better than me.

layik commented 5 years ago

yep! too big.

Robinlovelace commented 5 years ago

So why not just take the dataframe and remove the geometry...

Robinlovelace commented 5 years ago
pryr::object_size(sf::st_drop_geometry(ukboundaries::msoa2011_vsimple))
#> Using default data cache directory ~/.ukboundaries/cache 
#> Use cache_dir() to change it.
#> 1.41 MB

Created on 2019-03-07 by the reprex package (v0.2.1)

Robinlovelace commented 5 years ago

And you could just use...

Robinlovelace commented 5 years ago
pryr::object_size(ukboundaries::msoa2011_vsimple$msoa11cd)
#> Using default data cache directory ~/.ukboundaries/cache 
#> Use cache_dir() to change it.
#> 548 kB

Created on 2019-03-07 by the reprex package (v0.2.1)

Robinlovelace commented 5 years ago

and the name of course...

layik commented 5 years ago

We are working on the exact lines :)

Robinlovelace commented 5 years ago

Although pls convert it with as.character() first : )

layik commented 5 years ago
> pryr::object_size(ukboundaries::msoa2011_vsimple[,c("msoa11cd", "msoa11nm")])
7.76 MB

:( much bigger with string names

Robinlovelace commented 5 years ago

No, that's included the geometry. sf geometry columns are sticky.

layik commented 5 years ago

oh!

Robinlovelace commented 5 years ago

Try:

reprex::reprex(
  pryr::object_size(sf::st_drop_geometry(ukboundaries::msoa2011_vsimple[,c("msoa11cd", "msoa11nm")]))
)
layik commented 5 years ago

Right, let me put that in and rewrite above! about 1mb

layik commented 5 years ago

> pryr::object_size(sf::st_drop_geometry(ukboundaries::msoa2011_vsimple[,c("msoa11cd", "msoa11nm")]))
1.13 MB```
Robinlovelace commented 5 years ago

Will be useful for the advanced pct courses. You up for helping on those?

Robinlovelace commented 5 years ago

Also try running the actual command:

reprex::reprex(
  pryr::object_size(sf::st_drop_geometry(ukboundaries::msoa2011_vsimple[,c("msoa11cd", "msoa11nm")]))
)

it should copy the result onto your clipboard.

layik commented 5 years ago

Sure, looks like I am learning this thing :)

layik commented 5 years ago

I know don't worry, had ran that line as you pasted it :)

Robinlovelace commented 5 years ago

Let's get the pkg ready to open to the world first...

layik commented 5 years ago

Here :)

pryr::object_size(sf::st_drop_geometry(ukboundaries::msoa2011_vsimple[, 
    c("msoa11cd", "msoa11nm")]))
#> Using default data cache directory ~/.ukboundaries/cache 
#> Use cache_dir() to change it.
#> 1.13 MB

Created on 2019-03-07 by the reprex package (v0.2.1)

Robinlovelace commented 5 years ago

Yay!

layik commented 5 years ago

Let me send this PR, need to add/doc data first.

Robinlovelace commented 5 years ago

Hypothesis: that will become your default way of sharing reproducible examples in R. R for Reproducibility!

Robinlovelace commented 5 years ago

OK. Over and out for now. Amazing work brother. I'm off to 💤 land.

layik commented 5 years ago

stplanr::od2line needs the geometry I think to get the zones centroids. Discuss tomz or later.

Robinlovelace commented 5 years ago

Latest thinking on this: create a function to get_centroids_ew(). PR to follow.

Robinlovelace commented 5 years ago

I think PR https://github.com/ITSLeeds/pct/pull/8 solves 3 issues:

  1. object size by downloading it
  2. adds coordinates
  3. provides accuracy: these are the population not area weighted centroids

The new function can build on this I think.

Robinlovelace commented 5 years ago

Data source: ESRI http://geoportal.statistics.gov.uk/datasets/b0a6d8a3dc5d4718b3fd62c548d60f81_0

layik commented 5 years ago
pct_area_desire_lines = function(area = "sheffield", n = 100) {
  if(!exists("area"))
    stop("area is required.")
  if(length(area) != 1L)
    stop("'package' must be of length 1")
  if(is.na(area) || (area == "") || !is.character(area))
    stop("invalid area name")
  census_file = file.path(tempdir(), "wu03ew_v2.csv")
  if(!exists(census_file)) {
    download.file("https://s3-eu-west-1.amazonaws.com/statistics.digitalresources.jisc.ac.uk/dkan/files/FLOW/wu03ew_v2/wu03ew_v2.zip",
                  file.path(tempdir(), "wu03ew_v2.zip"))
    unzip(file.path(tempdir(), "wu03ew_v2.zip"), exdir = tempdir())
  }
  od_all = readr::read_csv(census_file)
  zones_all = pct::get_centroids_ew() # TODO: some warning?
  zones = zones_all[
    grepl(area, zones_all$msoa11nm,
          ignore.case = T), ]

  od_area = od_all[od_all$`Area of residence` %in% zones$msoa11cd &
                     od_all$`Area of workplace` %in% zones$msoa11cd, ]
  od_area = od_area[od_area$`Area of residence` !=
                      od_area$`Area of workplace`, ]
  od_area = od_area[order(od_area$`All categories: Method of travel to work`,
                          decreasing = TRUE),]
  od_area = od_area[1:n,]
  area_desire_lines = stplanr::od2line(
    flow = od_area[,c(2,1)], zones)

  area_desire_lines
}

d = pct_area_desire_lines(area = "wakefield")
#> Parsed with column specification:
#> cols(
#>   `Area of residence` = col_character(),
#>   `Area of workplace` = col_character(),
#>   `All categories: Method of travel to work` = col_double(),
#>   `Work mainly at or from home` = col_double(),
#>   `Underground, metro, light rail, tram` = col_double(),
#>   Train = col_double(),
#>   `Bus, minibus or coach` = col_double(),
#>   Taxi = col_double(),
#>   `Motorcycle, scooter or moped` = col_double(),
#>   `Driving a car or van` = col_double(),
#>   `Passenger in a car or van` = col_double(),
#>   Bicycle = col_double(),
#>   `On foot` = col_double(),
#>   `Other method of travel to work` = col_double()
#> )
#> Parsed with column specification:
#> cols(
#>   X = col_double(),
#>   Y = col_double(),
#>   objectid = col_double(),
#>   msoa11cd = col_character(),
#>   msoa11nm = col_character()
#> )

Created on 2019-03-08 by the reprex package (v0.2.1)

And mapview will show you this! image

Robinlovelace commented 5 years ago

Great work sir!

Robinlovelace commented 5 years ago
  od_area = od_area[order(od_area$`All categories: Method of travel to work`,
                          decreasing = TRUE),]
  od_area = od_area[1:n,]

That is clever. I'm sooooo glad you're not wedded to the tidyverse. Compsci thinking not narrow thinking.

Robinlovelace commented 5 years ago

I dislike this though:

$`All categories: Method of travel to work`
layik commented 5 years ago

Please override wherever you dislike! All I am doing is trying to use code that is R Foundation and not "tidyverse" foundation.