cyipt / actdev

ActDev - Active travel provision and potential in planned and proposed development sites
https://actdev.cyipt.bike
7 stars 3 forks source link

Write and demonstrate code to disagregate desire lines #24

Closed Robinlovelace closed 3 years ago

Robinlovelace commented 3 years ago

So that instead of all going to one centroid they go to a range of places within a zone of interest (e.g. to building polygons from OSM).

dabreegster commented 3 years ago

Possible reference code for deciding which buildings in OSM represent residences: https://github.com/dabreegster/abstreet/blob/9f72de94346077ca4a50ac7d10474ccfd50f28e8/map_model/src/make/buildings.rs#L154, by @matkoniecz We should definitely check these rules against the areas of study; some tags are regional.

matkoniecz commented 3 years ago

https://wiki.openstreetmap.org/wiki/Key:building:use may be also worth considering (if used in given area - building represents not current use but architecture, see https://wiki.openstreetmap.org/wiki/Church for some extreme examples - but that rarely really matters)

List of values for building tag in OSM: https://taginfo.openstreetmap.org/keys/building#values - note that it is paginated (many are building=yes If you want to make you local area better and you can use Android I recommend StreetComplete that will ask about missing info, including building types and lane count)

https://wiki.openstreetmap.org/wiki/Key%3Abuilding has list of documented values (some tags with pages may be missed)

If some building values or wiki pages are unclear let me know and there is a decent chance that I can improve it.

mem48 commented 3 years ago

You might also look at workplace zones. In areas with lots of non residential land use wpz are much smaller than lsoas and they are classified by the industries they employ. See https://maps.cdrc.ac.uk/#/metrics/industrywp/default/BTTTFFT/10/-0.1582/51.4721/

matkoniecz commented 3 years ago

That reminds me about

https://wiki.openstreetmap.org/wiki/Tag:landuse%3Dresidential https://wiki.openstreetmap.org/wiki/Tag:landuse%3Dretail https://wiki.openstreetmap.org/wiki/Tag:landuse%3Dindustrial https://wiki.openstreetmap.org/wiki/Tag:landuse%3Dcommercial

Robinlovelace commented 3 years ago

Good news, I've got basic functionality working to do this now:

remotes::install_github("itsleeds/od")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'od' from a github remote, the SHA1 (27aab841) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(od)
od = od::od_data_df
zones = od::od_data_zones_min
subzones = od_data_zones_small
od_disag = od_disaggregate(od, zones, subzones)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
ncol(od_disag) -1 == ncol(od) # same number of columns (except disag data gained geometry)
#> [1] TRUE
sum(od_disag[[3]]) == sum(od[[3]])
#> [1] TRUE
sum(od_disag[[4]]) == sum(od[[4]])
#> [1] TRUE
od_sf = od_to_sf(od, zones)
#> 0 origins with no match in zone ids
#> 0 destinations with no match in zone ids
#>  points not in od data removed.
plot(od_data_zones_small$geometry)
plot(od_data_zones_min$geometry, lwd = 3, col = NULL, add = TRUE)
#> Warning in rep(col, length.out = length(x)): 'x' is NULL so the result will be
#> NULL
plot(od_sf["all"], add = TRUE)
plot(od_disag["all"], add = TRUE)

Created on 2021-01-13 by the reprex package (v0.3.0)

Robinlovelace commented 3 years ago

Something I'm still a bit confused about is why the resulting 'oneway' versions of the disaggregated lines is different: I thought a give OD pair to be disaggregated can only go from a subzone in the origin zone to a subzone in the destination. This isn't a blocker but still trying to understand the process and how best to implement it (note the approach outlined above also would not work where there are many, many subzones, as is the case when using OSM polygons. Next step: check the results of the latter part of this reproducible example on a per OD pair basis:

remotes::install_github("itsleeds/od")
library(od)
od = od::od_data_df
zones = od::od_data_zones_min
subzones = od_data_zones_small
od_disag = od_disaggregate(od, zones, subzones)
ncol(od_disag) -1 == ncol(od) # same number of columns (except disag data gained geometry)
sum(od_disag[[3]]) == sum(od[[3]])
sum(od_disag[[4]]) == sum(od[[4]])
od_sf = od_to_sf(od, zones)
plot(od_data_zones_small$geometry)
plot(od_data_zones_min$geometry, lwd = 3, col = NULL, add = TRUE)
plot(od_sf["all"], add = TRUE)
plot(od_disag["all"], add = TRUE)
od_disag_oneway = od_oneway(od_disag)
od_sf_oneway = od_oneway(od_sf)
nrow(od_disag) / nrow(od_sf)
nrow(od_disag_oneway) / nrow(od_sf_oneway)
od_disag2 = od_disaggregate(od_sf_oneway %>% sf::st_drop_geometry(), zones, subzones)
od_disag2_oneway = od_oneway(od_disag2)
nrow(od_disag2)
nrow(od_disag2_oneway)
Robinlovelace commented 3 years ago

Update on this: it was a bug in the od package which is now fixed. The result of the test code above now returns the result that was expected:

``` r remotes::install_github("itsleeds/od") #> Using github PAT from envvar GITHUB_PAT #> Downloading GitHub repo itsleeds/od@HEAD #> #> checking for file ‘/tmp/RtmpD635TG/remotes227a5f1d5617/ITSLeeds-od-52f2c75/DESCRIPTION’ ... ✔ checking for file ‘/tmp/RtmpD635TG/remotes227a5f1d5617/ITSLeeds-od-52f2c75/DESCRIPTION’ #> ─ preparing ‘od’: #> checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information #> ─ checking for LF line-endings in source and make files and shell scripts #> ─ checking for empty or unneeded directories #> ─ building ‘od_0.0.1.9000.tar.gz’ #> #> #> Installing package into '/home/robin/R/x86_64-pc-linux-gnu-library/4.0' #> (as 'lib' is unspecified) library(od) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union od = od::od_data_df zones = od::od_data_zones_min subzones = od_data_zones_small od_disag = od_disaggregate(od, zones, subzones) #> although coordinates are longitude/latitude, st_intersects assumes that they are planar #> although coordinates are longitude/latitude, st_intersects assumes that they are planar #> Warning in subpoints_joined$geo_code_ag == od$geo_code1: longer object length is #> not a multiple of shorter object length #> Warning in plot.sf(subpoints_joined[subpoints_joined$geo_code_ag == #> od$geo_code1, : ignoring all but the first attribute #> Warning in subpoints_joined$geo_code_ag == od$geo_code2: longer object length is #> not a multiple of shorter object length #> Warning in plot.sf(subpoints_joined[subpoints_joined$geo_code_ag == #> od$geo_code2, : ignoring all but the first attribute ``` ![](https://i.imgur.com/bpNTUSA.png) ``` r ncol(od_disag) -1 == ncol(od) # same number of columns (except disag data gained geometry) #> [1] TRUE sum(od_disag[[3]]) == sum(od[[3]]) #> [1] TRUE sum(od_disag[[4]]) == sum(od[[4]]) #> [1] TRUE od_sf = od_to_sf(od, zones) #> 0 origins with no match in zone ids #> 0 destinations with no match in zone ids #> points not in od data removed. plot(od_data_zones_small$geometry) plot(od_data_zones_min$geometry, lwd = 3, col = NULL, add = TRUE) #> Warning in rep(col, length.out = length(x)): 'x' is NULL so the result will be #> NULL plot(od_sf["all"], add = TRUE) plot(od_disag["all"], add = TRUE) ``` ![](https://i.imgur.com/3RYUbSo.png) ``` r od_disag_oneway = od_oneway(od_disag) od_sf_oneway = od_oneway(od_sf) nrow(od_disag) / nrow(od_sf) #> [1] 26 nrow(od_disag_oneway) / nrow(od_sf_oneway) #> [1] 26 ``` Created on 2021-01-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)
Robinlovelace commented 3 years ago

Heads-up @matkoniecz based partly on your suggestions, but mostly the building types inferred from the link shared by @dabreegster, I have generated a test dataset of buildings in 3 zones in Leeds. I'm putting this tiny example dataset into the 'od' R package for testing as a solid basis for function development that should eventually lead this issue to be fixed. Thanks for the input so far #workinprogress!

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(od)
# od_data_buildings
building_types = c(
  "office",
  "industrial",
  "commercial",
  "retail",
  "warehouse",
  "civic",
  "public"
)
leeds_osm = osmextract::oe_get(place = "leeds", layer = "multipolygons")
#> No exact match found for place = leeds and provider = geofabrik. Best match is Laos. 
#> Checking the other providers.
#> An exact string match was found using provider = bbbike.
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations.
#> Reading layer `multipolygons' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/geofabrik_Leeds.gpkg' using driver `GPKG'
#> Simple feature collection with 391926 features and 25 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -1.89 ymin: 53.65 xmax: -1.280002 ymax: 53.88
#> geographic CRS: WGS 84
leeds_osm_buildigs = leeds_osm %>% 
  filter(building %in% building_types)

zones_of_interest = od_data_zones_min[od_data_zones_min$geo_code %in% c(od_data_df$geo_code1[1:2], od_data_df$geo_code2[1:2]), ]
mapview::mapview(zones_of_interest)

buildings_in_zones = leeds_osm_buildigs[zones_of_interest, , op = sf::st_within]
#> although coordinates are longitude/latitude, st_within assumes that they are planar
mapview::mapview(buildings_in_zones)

Created on 2021-01-14 by the reprex package (v0.3.0)

Robinlovelace commented 3 years ago

Demonstration of this in reproducible code below.

``` r # Aim: demonstrate disaggregating polygons for #24 # remotes::install_github("itsleeds/od", "bdb44597f0b701e683e5208b837c9a91c7036838") remotes::install_github("itsleeds/od") #> Using github PAT from envvar GITHUB_PAT #> Skipping install of 'od' from a github remote, the SHA1 (bdb44597) has not changed since last install. #> Use `force = TRUE` to force installation library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union setwd("~/cyipt/actdev/") # input data: we should probably have naming conventions for these sites = sf::read_sf("all-sites.geojson") site_area = sf::read_sf("geojsons/great-kneighton.geojson") desire_lines = sf::read_sf("data-small/great-kneighton/great-kneighton-desire-lines.geojson") study_area = sf::read_sf("data-small/great-kneighton/great-kneighton-study-area.geojson") # buildings = osmextract::oe_get(study_area, layer = "multipolygons") osm_polygons = osmextract::oe_get("cambridgeshire", layer = "multipolygons") #> The input place was matched with: Cambridgeshire #> The chosen file was already detected in the download directory. Skip downloading. #> The corresponding gpkg file was already detected. Skip vectortranslate operations. #> Reading layer `multipolygons' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/geofabrik_cambridgeshire-latest.gpkg' using driver `GPKG' #> Simple feature collection with 239843 features and 25 fields #> geometry type: MULTIPOLYGON #> dimension: XY #> bbox: xmin: -0.5251041 ymin: 51.96659 xmax: 0.5198601 ymax: 52.77267 #> geographic CRS: WGS 84 # from od/data-raw folder building_types = c( "office", "industrial", "commercial", "retail", "warehouse", "civic", "public" ) osm_buildigs = osm_polygons %>% filter(building %in% building_types) zones = pct::get_pct_zones("cambridgeshire") zones = pct::get_pct_zones("cambridgeshire", geography = "msoa") zones_of_interest = zones[zones$geo_code %in% c(desire_lines$geo_code1, desire_lines$geo_code2), ] mapview::mapview(zones_of_interest) ``` ![](https://i.imgur.com/GO9VJZZ.png) ``` r buildings_in_zones = osm_buildigs[zones_of_interest, , op = sf::st_within] #> although coordinates are longitude/latitude, st_within assumes that they are planar mapview::mapview(buildings_in_zones) ``` ![](https://i.imgur.com/Bhgog2V.png) ``` r buildings_in_zones = buildings_in_zones %>% filter(!is.na(osm_way_id)) %>% select(osm_way_id, building) osm_polygons_in_site = osm_polygons[site_area, , op = sf::st_within] #> although coordinates are longitude/latitude, st_within assumes that they are planar osm_polygons_resi_site = osm_polygons_in_site %>% filter(building == "residential") %>% select(osm_way_id, building) buildings_od = rbind(buildings_in_zones, osm_polygons_resi_site) mapview::mapview(buildings_od) + mapview::mapview(site_area) + mapview::mapview(desire_lines) + mapview::mapview(zones_of_interest) ``` ![](https://i.imgur.com/FXkkdNV.png) ``` r od_df = desire_lines %>% sf::st_drop_geometry() od_df #> # A tibble: 10 x 25 #> geo_code1 geo_code2 all from_home light_rail train bus taxi motorbike #> * #> 1 E02003730 E02003720 9.29 0 0 0 0 0 0 #> 2 E02003730 E02003721 47.1 0 0 0.663 0.663 0 0.663 #> 3 E02003730 E02003722 25.2 0 0 0 1.33 0 0 #> 4 E02003730 E02003723 69.6 0 0 1.99 7.96 0.663 0.663 #> 5 E02003730 E02003725 443. 0 0.663 1.99 31.2 0.663 1.99 #> 6 E02003730 E02003726 80.9 0 0 0 3.98 0 0 #> 7 E02003730 E02003727 21.9 0 0 0 0 0 0.663 #> 8 E02003730 E02003728 35.8 0 0 0 2.65 0 0 #> 9 E02003730 E02003729 28.5 0 0 0 1.99 0 0 #> 10 E02003730 E02003731 316. 0 0 1.99 35.2 0 3.32 #> # … with 16 more variables: car_driver , car_passenger , #> # bicycle , foot , other , length , gradient , #> # pwalk_commute_base , pcycle_commute_base , #> # pdrive_commute_base , walk_commute_godutch , #> # bicycle_commute_godutch , car_commute_godutch , #> # pwalk_commute_godutch , pcycle_commute_godutch , #> # pdrive_commute_godutch zones_of_interest #> Simple feature collection with 11 features and 129 fields #> geometry type: POLYGON #> dimension: XY #> bbox: xmin: 0.0698009 ymin: 52.15794 xmax: 0.1832138 ymax: 52.23717 #> geographic CRS: WGS 84 #> # A tibble: 11 x 130 #> geo_code geo_name lad11cd lad_name all bicycle foot car_driver #> #> 1 E020037… Cambrid… E07000… Cambrid… 4097 1475 554 1444 #> 2 E020037… Cambrid… E07000… Cambrid… 4537 1600 532 1747 #> 3 E020037… Cambrid… E07000… Cambrid… 4233 1635 686 1359 #> 4 E020037… Cambrid… E07000… Cambrid… 2326 897 408 682 #> 5 E020037… Cambrid… E07000… Cambrid… 3480 1271 880 769 #> 6 E020037… Cambrid… E07000… Cambrid… 3973 1331 1117 789 #> 7 E020037… Cambrid… E07000… Cambrid… 5156 1799 1032 1442 #> 8 E020037… Cambrid… E07000… Cambrid… 4493 1408 709 1529 #> 9 E020037… Cambrid… E07000… Cambrid… 4210 997 330 1901 #> 10 E020037… Cambrid… E07000… Cambrid… 3484 1050 530 1240 #> 11 E020037… Cambrid… E07000… Cambrid… 3894 1024 888 1284 #> # … with 122 more variables: car_passenger , motorbike , #> # train_tube , bus , taxi_other , govtarget_slc , #> # govtarget_sic , govtarget_slw , govtarget_siw , #> # govtarget_sld , govtarget_sid , govtarget_slp , #> # govtarget_sip , govtarget_slm , govtarget_sim , #> # govtarget_slpt , govtarget_sipt , govnearmkt_slc , #> # govnearmkt_sic , govnearmkt_slw , govnearmkt_siw , #> # govnearmkt_sld , govnearmkt_sid , govnearmkt_slp , #> # govnearmkt_sip , govnearmkt_slm , govnearmkt_sim , #> # govnearmkt_slpt , govnearmkt_sipt , gendereq_slc , #> # gendereq_sic , gendereq_slw , gendereq_siw , #> # gendereq_sld , gendereq_sid , gendereq_slp , #> # gendereq_sip , gendereq_slm , gendereq_sim , #> # gendereq_slpt , gendereq_sipt , dutch_slc , dutch_sic , #> # dutch_slw , dutch_siw , dutch_sld , dutch_sid , #> # dutch_slp , dutch_sip , dutch_slm , dutch_sim , #> # dutch_slpt , dutch_sipt , ebike_slc , ebike_sic , #> # ebike_slw , ebike_siw , ebike_sld , ebike_sid , #> # ebike_slp , ebike_sip , ebike_slm , ebike_sim , #> # ebike_slpt , ebike_sipt , base_slcyclehours , #> # govtarget_sicyclehours , govnearmkt_sicyclehours , #> # gendereq_sicyclehours , dutch_sicyclehours , #> # ebike_sicyclehours , base_sldeath , base_slyll , #> # base_slvalueyll , base_slsickdays , base_slvaluesick , #> # base_slvaluecomb , govtarget_sideath , govtarget_siyll , #> # govtarget_sivalueyll , govtarget_sisickdays , #> # govtarget_sivaluesick , govtarget_sivaluecomb , #> # govnearmkt_sideath , govnearmkt_siyll , #> # govnearmkt_sivalueyll , govnearmkt_sisickdays , #> # govnearmkt_sivaluesick , govnearmkt_sivaluecomb , #> # gendereq_sideath , gendereq_siyll , gendereq_sivalueyll , #> # gendereq_sisickdays , gendereq_sivaluesick , #> # gendereq_sivaluecomb , dutch_sideath , dutch_siyll , #> # dutch_sivalueyll , dutch_sisickdays , dutch_sivaluesick , … zones_of_interest_min = zones_of_interest %>% select(geo_code) site_area #> Simple feature collection with 1 feature and 0 fields #> geometry type: POLYGON #> dimension: XY #> bbox: xmin: 0.1076317 ymin: 52.16472 xmax: 0.1270294 ymax: 52.18035 #> geographic CRS: WGS 84 #> # A tibble: 1 x 1 #> geometry #> #> 1 ((0.1084471 52.1664, 0.1076317 52.16488, 0.1109791 52.16472, 0.1177168 52.167… site_area_id = sf::st_sf(data.frame(geo_code = "s1"), geometry = site_area$geometry) zones_all = rbind(zones_of_interest_min, site_area_id) od_df$geo_code1 = "s1" desire_lines_disag = od::od_disaggregate(od = od_df, z = zones_all, subzones = buildings_od) #> although coordinates are longitude/latitude, st_intersects assumes that they are planar #> although coordinates are longitude/latitude, st_intersects assumes that they are planar mapview::mapview(desire_lines_disag) + mapview::mapview(buildings_od) ``` ![](https://i.imgur.com/DHvF9ws.png) ``` r sf::write_sf(desire_lines_disag, "data-small/great-kneighton/desire_lines_disag.geojson") #> Warning in CPL_write_ogr(obj, dsn, layer, driver, #> as.character(dataset_options), : GDAL Error 6: DeleteLayer() not supported by #> this dataset. ``` Created on 2021-01-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)
Robinlovelace commented 3 years ago

Interested to hear feedback on this, especially from @joeytalbot and @dabreegster.

See the resulting geojsons (buildings and disaggregated desire lines) here: https://github.com/cyipt/actdev/tree/main/data-small/great-kneighton

See (and try to reproduce if you get a chance) the code that generated that here: https://github.com/cyipt/actdev/blob/main/code/tests/disaggregate.R

Robinlovelace commented 3 years ago

Any comments on the actual function and the documentation very welcome also: https://itsleeds.github.io/od/reference/od_disaggregate.html