cyclestreets / cyclestreets-r

An R interface to cyclestreets.net APIs
https://rpackage.cyclestreets.net/
GNU General Public License v3.0
27 stars 7 forks source link

Hilliness #14

Closed Robinlovelace closed 4 years ago

Robinlovelace commented 4 years ago

Opening this issue as a place to properly document efforts to generate accurate gradient estimates from cyclestreets elvation data.

Robinlovelace commented 4 years ago

The new gradient_segment column is in there now:

# Aim: test gradient calculations in CycleStreets

library(cyclestreets)
from = tmaptools::geocode_OSM("potternewton park")
to = tmaptools::geocode_OSM("university of leeds")
r = journey(from$coords, to$coords, cols = NULL, cols_extra = NULL)
mapview::mapview(r["gradient_segment"])

Created on 2020-04-17 by the reprex package (v0.3.0)

Robinlovelace commented 4 years ago

image

Robinlovelace commented 4 years ago

Problem: short segments still have an unwanted impact on the results, meaning they need to be smoothed to not distort the results. That can be a post-processing stage.

Robinlovelace commented 4 years ago

Heads-up @joeytalbot here's the result with the smoothing method in there:

# Aim: test gradient calculations in CycleStreets

library(cyclestreets)
from = tmaptools::geocode_OSM("potternewton park")
to = tmaptools::geocode_OSM("university of leeds")
r = journey(from$coords, to$coords, cols = NULL, cols_extra = NULL)
mapview::mapview(r["gradient_segment"])


# smooth unwanted high gradients
summary(r$distances)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     9.0    48.5   122.0   197.1   186.2   771.0
summary(r$gradient_segment)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> 0.000000 0.007194 0.019018 0.026387 0.037812 0.111111
plot(r$distances, r$gradient_segment)


distance_cutoff = 20
gradient_cutoff = 0.1
sel = r$gradient_segment > 0.1 &
  r$distances <= distance_cutoff
summary(sel)
#>    Mode   FALSE    TRUE 
#> logical      17       1
r$gradient_segment_smooth = stplanr::route_rolling_average(r$gradient_segment)
r$gradient_segment[sel]
#> [1] 0.1111111
r$gradient_segment_smooth[sel]
#> [1] 0.05731033

r$gradient_segment[sel] = r$gradient_segment_smooth[sel]

plot(r$distances, r$gradient_segment)

mapview::mapview(r["gradient_segment"])

Created on 2020-04-17 by the reprex package (v0.3.0)

mvl22 commented 4 years ago

There should already be smoothing at the API data result end. If you have specific cases where things seem odd, @si-the-pie would like to know.

Robinlovelace commented 4 years ago

Thanks @mvl22, it seems the vertical resolution of your data is 1m. If you have segments of 10m or less, obviously this will lead to some segments with high gradients due to rounding up or down. See what I mean? The smoothed result below matches with local knowledge, it's my cycle to work!

Robinlovelace commented 4 years ago

BTW it's only smoothing 1 anomalous section that has a distance of 9m. So it's not really smoothing, it's identifying and fixing anomalies.

Robinlovelace commented 4 years ago

Heads-up @mvl22 and @joeytalbot here is the resulting simple function in action. I plan to add this as an optional argument smooth_gradient into the journey() function and return a new column to avoid data loss.

# Aim: test gradient calculations in CycleStreets

library(cyclestreets)
from = tmaptools::geocode_OSM("potternewton park")
to = tmaptools::geocode_OSM("university of leeds")
r = journey(from$coords, to$coords, cols = NULL, cols_extra = NULL)
mapview::mapview(r["gradient_segment"])


# smooth unwanted high gradients
summary(r$distances)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>     9.0    48.5   122.0   197.1   186.2   771.0
summary(r$gradient_segment)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#> 0.000000 0.007194 0.019018 0.026387 0.037812 0.111111
plot(r$distances, r$gradient_segment)

distance_cutoff = 20
gradient_cutoff = 0.1
sel = r$gradient_segment > 0.1 &
  r$distances <= distance_cutoff
summary(sel)
#>    Mode   FALSE    TRUE 
#> logical      17       1
r$gradient_segment_smooth = stplanr::route_rolling_average(r$gradient_segment)
r$gradient_segment[sel]
#> [1] 0.1111111
r$gradient_segment_smooth[sel]
#> [1] 0.05731033

# r$gradient_segment[sel] = r$gradient_segment_smooth[sel]

plot(r$distances, r$gradient_segment)

mapview::mapview(r["gradient_segment"])


smooth_with_cutoffs = function(
  gradient_segment,
  distances,
  distance_cutoff = 20,
  gradient_cutoff = 0.1
  ) {
  sel = gradient_segment > 0.1 &
    distances <= distance_cutoff
  summary(sel)
  gradient_segment_smooth = stplanr::route_rolling_average(gradient_segment)
  gradient_segment[sel]
  gradient_segment_smooth[sel]

  gradient_segment[sel] = gradient_segment_smooth[sel]
  gradient_segment

}

r$gradient_segment
#>  [1] 0.051282051 0.015564202 0.005128205 0.007797271 0.006993007 0.057142857
#>  [7] 0.029569892 0.111111111 0.031250000 0.030769231 0.022471910 0.000000000
#> [13] 0.014388489 0.040000000 0.040000000 0.011494253 0.000000000 0.000000000
smooth_with_cutoffs(gradient_segment = r$gradient_segment, distances = r$distances)
#>  [1] 0.051282051 0.015564202 0.005128205 0.007797271 0.006993007 0.057142857
#>  [7] 0.029569892 0.057310335 0.031250000 0.030769231 0.022471910 0.000000000
#> [13] 0.014388489 0.040000000 0.040000000 0.011494253 0.000000000 0.000000000

Created on 2020-04-17 by the reprex package (v0.3.0)

Robinlovelace commented 4 years ago

This is the line to watch

#>  [7] 0.029569892 0.111111111 0.031250000 0.030769231 0.022471910 0.000000000
Robinlovelace commented 4 years ago

Heads-up @joeytalbot and @mvl22, latest version has the following example:

library(cyclestreets)
r7 = journey(c(-1.524, 53.819), c(-1.556, 53.806), smooth_gradient = TRUE)
plot(r7["gradient_segment"])

plot(r7["gradient_smooth"])

Created on 2020-04-17 by the reprex package (v0.3.0)

joeytalbot commented 4 years ago

Yes the smoothing makes a definite improvement @Robinlovelace

I'd suggest renaming or adding comments explaining n and lag to make it clear that in these functions n refers to distances and lag refers to elevations.

With the existing approach, as I understand it, change in elevation across a segment is calculated as the difference between the mean elevation of that segment and the mean elevation of the following segment. This means the natural option to choose for every single segment (not just the ones with outlying gradients) is lag=1 and n=2, since lag = 1 already calculates change in mean elevation between that segment and the following one, so it is only right for the distance to be similarly calculated as the mean of the lengths of that segment and the following one.

n=2 and lag=1 are essentially both referring to the same thing (for distances and elevations respectively).

Of course if we adapt the way elevation is calculated, so the functions use 'difference between elevation at start of segment and elevation at end of segment', the meaning of lag will change again.

Robinlovelace commented 4 years ago

I'd suggest renaming or adding comments explaining n and lag to make it clear that in these functions n refers to distances and lag refers to elevations.

The latest approach does not use n and lag. route_rolling_average uses a different function that has only one parameter: https://docs.ropensci.org/stplanr/reference/index.html#section-work-with-and-analyse-routes

Robinlovelace commented 4 years ago

I agree in general with the suggestion to change the parameter names. Pull requests to stplanr welcome, but don't think this is a priority.

si-the-pie commented 4 years ago

HI, I'm chipping in the following comment, hope its useful, Simon

Elevation data in CycleStreets comes from a variety of surveys, but in GB from Ordnance Survey Terrain 50, where the resolution of the pixels is about 36 metres by 61 metres in Leeds. The elevation of the pixel covering the bit of Woodhouse Lane between Cookridge Street and Clay Pit lane is 57.9 metres:

gdallocationinfo -valonly osTerrain50_wgs84.tiff 14087 12905
# Output
57.9000015258789

Within the CycleStreets routing algorithm bilinear interpolation of elevations from the neighbouring four pixels is used, and there we go to millimetre level of accuracy, and this helps to smooth over the pixellation.

But elevation values in the output routes are only given rounded to the nearest metre. There are no plans to change this as adding that extra detail is not normally needed.

Robinlovelace commented 4 years ago

Thanks for the explanation. Sounds like you calculate gradients internally. Any plans to release a 'gradient' output? For now the estimates from the 1m outputs are fine for our needs.

si-the-pie commented 4 years ago

No, there are no plans on gradient, but maybe we should because it is used to create a map tile layer in CycleStreets like the following.

leeds_gradient

The idea of this rendering is that flat streets are almost invisible. A gray scale is used to indicate increasing gradient. Flat areas are almost invisible, the bluey-gray becomes darker as the streets become steeper up to 5%. From 5% lines are shown as blue, from 10% as amber, and above 20% as red. Note that anything above 5% is quite steep for many cyclists.

mvl22 commented 4 years ago

Article that may be of general interest I came across.

CTC CycleDigest #58, 2009.

C10A0EAC-7377-4363-91B0-3A9ED9BC4EDE

joeytalbot commented 4 years ago

If you're happy to reconsider this, I think it could be better to smooth the gradients differently.

Given that the segments we're smoothing are <= 20m in length, a 1 or 2m change in elevation across a segment can produce an anomalously high gradient. The latest method takes the mean of the gradient within the segment itself and the gradient of the two adjoining segments. Therefore this can still be heavily influenced by the high calculated gradient of a very short segment.

Instead we could take the mean distance and mean elevation change across the three nearest segments, and use these to calculate the gradient. See below for how it would change the Leeds example.

library(cyclestreets)
from = tmaptools::geocode_OSM("potternewton park")
to = tmaptools::geocode_OSM("university of leeds")
r = journey(from$coords, to$coords, smooth_gradient = TRUE)

r = r %>% 
  mutate(elev_change = gradient_segment*distances,
         m_dist = route_rolling_average(r$distances),
         m_elev_c = route_rolling_average(r$elev_change)
  )

r$elev_change
r$m_dist
r$m_elev_c

r = r %>% 
  mutate(gradient_smooth_method2 = ifelse(distances <= 20 & gradient_segment > 0.1, m_elev_c/m_dist, gradient_segment))

r$gradient_segment
r$gradient_smooth
r$smooth_gradient_method2

mapview::mapview(r["gradient_smooth"])
mapview::mapview(r["gradient_smooth_method2"])
joeytalbot commented 4 years ago

gradient_smooth_method2

joeytalbot commented 4 years ago

And the existing method for comparison: gradient_smooth_method1

Robinlovelace commented 4 years ago

Agree, that looks better.

Robinlovelace commented 4 years ago

Re-opening this based on discussion here: https://github.com/Robinlovelace/cyclestreets/pull/17

Robinlovelace commented 4 years ago

I suggest we need an evidence-based assessment of gradient smoothing. We could add a function to stplanr that calculates gradients per segment based on the linestring and a raster dataset.

Demo of how this works written by @Nowosad : https://geocompr.robinlovelace.net/geometric-operations.html#raster-extraction

image

An example of where something in our book that I'd never tested may be useful for our own research!

Robinlovelace commented 4 years ago

2m res data associated with study area:

https://environment.data.gov.uk/UserDownloads/interactive/2133d5bf5cd343daaf703ab8fb3b55d98628/LIDARCOMP/LIDAR-DTM-2M-SE23ne.zip

https://environment.data.gov.uk/UserDownloads/interactive/2133d5bf5cd343daaf703ab8fb3b55d98628/LIDARCOMP/LIDAR-DTM-2M-SE23se.zip

https://environment.data.gov.uk/UserDownloads/interactive/2133d5bf5cd343daaf703ab8fb3b55d98628/LIDARCOMP/LIDAR-DTM-2M-SE33nw.zip

https://environment.data.gov.uk/UserDownloads/interactive/2133d5bf5cd343daaf703ab8fb3b55d98628/LIDARCOMP/LIDAR-DTM-2M-SE33sw.zip

Robinlovelace commented 4 years ago

https://environment.data.gov.uk/DefraDataDownload/?Mode=survey

Robinlovelace commented 4 years ago

An example where the routing goes over a big hill (I think) https://www.cyclestreets.net/journey/68884578/

Robinlovelace commented 4 years ago

In terms of getting all CS values in there, it's worth allowing the user to store the raw elevations I think, which could be extracted as follows:

library(tidyverse)
library(dplyr)
df <- tibble(
  x = 1:3,
  y = c("a", "d,e,f", "g,h")
)
df %>%
  transform(y = strsplit(y, ",")) %>%
  unnest(y)
#> # A tibble: 6 x 2
#>       x y    
#>   <int> <chr>
#> 1     1 a    
#> 2     2 d    
#> 3     2 e    
#> 4     2 f    
#> 5     3 g    
#> 6     3 h

Created on 2020-04-24 by the reprex package (v0.3.0)

Robinlovelace commented 4 years ago

Heads-up @joeytalbot and @mvl22 I've gone back to the drawing board with this and have written a function that can calculate the gradients directly from the point data. Because the elevations are only provided to the nearest metre we will still have to think about smoothing these results. I guess a weighted mean is the way to go on this, but comparisons with 'ground truth' data are definitely still needed. As it happens, the new function will also help generate ground truth data.

New function: https://docs.ropensci.org/stplanr/reference/route_slope_matrix.html

Another question, mainly for @joeytalbot and @mem48, is it worth splitting these new slope functions out into a new package, e.g. called slopes?

mvl22 commented 4 years ago

Cc @si-the-pie. Robin it’s best to cc him as well as me.

mvl22 commented 4 years ago

Sounds like you want an elevations=floats option.

This smoothing stuff should really be upstream in the API generally.

Robinlovelace commented 4 years ago

Sounds like you want an elevations=floats option.

True that, I think it will improve the utility of CS results.

Robinlovelace commented 4 years ago

Thanks for the response in the PR @Nowosad, realise that bilinear interpolation is needed. I think GDAL can extract raster values from XY values, right? Similar question: https://gis.stackexchange.com/questions/269603/extract-raster-values-from-point-using-gdal

mvl22 commented 4 years ago

That's what we're doing in the source processing that leads to the API output. You'll be doing this twice over.

Robinlovelace commented 4 years ago

Yes but it provides accurate gradient estimates that we need for estimating cycling potential. I've just cracked it and can now provide accurate gradients for any number of linestrings based on bilinear interpolation of a raster DEM. Not yet optimised but the code, which I plan to make available via stplanr, looks a bit like this:

rg3d = function(x, elevation = NULL) {
  m = st_coordinates(x)
  x_sfc = st_sfc(x)
  if(!is.null(elevation)) {
    e = raster::extract(elevation, sf::st_sf(st_cast(x_sfc, "POINT")),
                        method = "bilinear")
  } else {
    e = x[, 3]
  }
  g1 = stplanr::route_slope_matrix(m, e = e, lonlat = FALSE)
  d = stplanr::route_sequential_dist(m = m, lonlat = FALSE)
  weighted.mean(abs(g1), d, na.rm = TRUE)
}

route_gradient_3dline = function(r, e = NULL) {
  res_list = 
    if (requireNamespace("pbapply", quietly = TRUE)) {
      pbapply::pblapply(sf::st_geometry(r), rg3d, e = e)
    } else {
      lapply(sf::st_geometry(r), rg3d, e = e)
    }
  unlist(res_list)
}

Have you explored shipping the data as 3d polylines? That is the way things are moving I think and makes extracting the gradient trivial.

Robinlovelace commented 4 years ago

I'm tracking progress on the general function here: https://github.com/ropensci/stplanr/issues/392

mem48 commented 4 years ago

A package would be a good idea, I did a bunch of work on route profilling for optitruck working out slopes and radius of curvature that might be useful too.

mem48 commented 4 years ago

https://github.com/mem48/OptiTruck/blob/master/R/OTP/line_curvature2.R

Robinlovelace commented 4 years ago

Looks good @mem48 here is the function in stplanr that could also go in a new R package: https://docs.ropensci.org/stplanr/reference/route_slope_matrix.html

Robinlovelace commented 4 years ago

New package: https://itsleeds.github.io/slopes/

Robinlovelace commented 4 years ago

Update on this: This is now in there on the latest version (not yet on CRAN):

library(cyclestreets)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stplanr)
from = tmaptools::geocode_OSM("potternewton park")
to = tmaptools::geocode_OSM("university of leeds")
r = journey(from$coords, to$coords, smooth_gradient = TRUE)
names(r)
#>  [1] "name"              "distances"         "time"             
#>  [4] "busynance"         "elevations"        "start_longitude"  
#>  [7] "start_latitude"    "finish_longitude"  "finish_latitude"  
#> [10] "gradient_segment"  "elevation_change"  "provisionName"    
#> [13] "quietness"         "quietness_segment" "geometry"         
#> [16] "gradient_smooth"

r = r %>% 
  mutate(elev_change = gradient_segment*distances,
         m_dist = route_rolling_average(distances),
         m_elev_c = route_rolling_average(elevation_change)
  )

r$elev_change
#>  [1]  8  0 12  1  4  1  6 22  1  5  2  6  0  2  1  3  1  0  0
r$m_dist
#>  [1]        NA 310.00000 324.00000 493.33333 283.66667 253.33333 330.33333
#>  [8] 286.00000 304.00000  77.66667 163.66667 122.66667 147.33333  66.66667
#> [15]  79.66667  62.33333  59.00000  48.33333        NA
r$m_elev_c
#>  [1]        NA 6.6666667 4.3333333 5.6666667 2.0000000 3.6666667 9.6666667
#>  [8] 9.6666667 9.3333333 2.6666667 4.3333333 2.6666667 2.6666667 1.0000000
#> [15] 2.0000000 1.6666667 1.3333333 0.3333333        NA

r = r %>% 
  mutate(gradient_smooth_method2 = ifelse(distances <= 20 & gradient_segment > 0.1, m_elev_c/m_dist, gradient_segment))

plot(r$gradient_segment, r$gradient_smooth)

r$gradient_smooth_method2
#>  [1] 0.051948052 0.000000000 0.015564202 0.005102041 0.007797271 0.007042254
#>  [7] 0.057142857 0.029569892 0.030701754 0.031446541 0.030769231 0.022471910
#> [13] 0.000000000 0.014388489 0.040000000 0.040000000 0.011494253 0.000000000
#> [19] 0.000000000
plot(r$gradient_smooth, r$gradient_smooth_method2)

Created on 2020-09-25 by the reprex package (v0.3.0)

Robinlovelace commented 4 years ago

Closing because smoothed gradients are returned by default now, as shown above. Not perfect but good enough for now.