JosiahParry / rsgeo

R bindings to the geo Rust crate
https://rsgeo.josiahparry.com/
Other
47 stars 5 forks source link

`line_segmentize()` doesn't always return `n` number of geometries #34

Closed Robinlovelace closed 1 year ago

Robinlovelace commented 1 year ago

Reproducible example:

l <- sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.1.2/test_line.geojson")
n_segments <- jsonlite::read_json("https://github.com/ropensci/stplanr/releases/download/v1.1.2/n_segments.json", simplifyVector = TRUE)

# extract geometry and convert to rsgeo
geo <- rsgeo::as_rsgeo(sf::st_geometry(l))

# segmentize the line strings
res_rsgeo <- rsgeo::line_segmentize(geo, n_segments)

# make them into sfc_LINESTRING
res <- sf::st_cast(sf::st_as_sfc(res_rsgeo), "LINESTRING")
length(res)
sum(n_segments)

Context (see you've already responded there, sorry for duplicating content but already writing here @JosiahParry ): https://github.com/ropensci/stplanr/issues/552

Robinlovelace commented 1 year ago

Reprex just updated so it actually works.

Robinlovelace commented 1 year ago

Awesome. Great work! Regarding passing unprojected data to line_segmentize(), other than slight inaccuracies due to lon and lat not being in units of distances, are there any other/strong reasons not to?

JosiahParry commented 1 year ago

At larger distance the distance calculation will be inaccurate. We may create segments of the wrong size. The euclidean length that is calculated ignores the curvature of the earth. So if we're calculating distance between locations far enough apart that the curvature starts to matter, we're going to be drawing line through the earths crust

Robinlovelace commented 1 year ago

Good explanation and matches my understanding. For short distances far from the poles it will work, but lengths will be a bit out is my take.

JosiahParry commented 1 year ago

@Robinlovelace I'm working on a PR to geo to create a geodesic segmentize function which would be better suited for this situation. Hopefully I can accomplish that In the coming weeks which should be able to be added to rsgeo with little effort and a subsequent cran release. keep me honest though 🙃

JosiahParry commented 1 year ago

https://github.com/georust/geo/pull/1107

JosiahParry commented 1 year ago

Very modest speed hit for haversine segmentize. Will eventually work towards geodesic calculations which are more accurate at longer distances than haversine. Geodesic is a "much" slower calculation, though. For linesegments at an urban scale, though, I don't think it matters :)

# segmentize the line strings
bench::mark(
  euclidean = rsgeo::line_segmentize(geo, n_segments),
  haversine = rsgeo::line_segmentize_haversine(geo, n_segments),
  check = FALSE,
  iterations = 5000
)

#> # A tibble: 2 × 13
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
#> 1 euclidean    2.24ms   3.07ms      230.    8.98KB    0.461  4990    10      21.7s <NULL>
#> 2 haversine    2.31ms   3.19ms      224.    8.98KB    0.404  4991     9      22.3s <NULL>
#> # ℹ 3 more variables: memory <list>, time <list>, gc <list>
Robinlovelace commented 1 year ago

Impressive, keep up the great work!

Robinlovelace commented 9 months ago

@JosiahParry Just to flag: @wangzhao0217 and I have both hit what looks like this issue again from rnet_merge().

No smoking gun but maybe worth re-running some of the tests at some point and re-opening if the issue persists. It will be a non issue for us after we switch to {rnetmatch} but could affect others.

JosiahParry commented 9 months ago

Do you have a repro?

wangzhao0217 commented 9 months ago

Hi Josiah, I have written some code for you to test. it could return the error:

rnet_merged_all = stplanr::rnet_merge(rnet_xp, rnet_yp, dist = dist, segment_length = 10, funs = funs, max_angle_diff = angle) Error in [[<-.data.frame(*tmp*, attr(l, "sf_column"), value = list( : replacement has 632324 rows, data has 634573

  url_rnet_x = "https://github.com/nptscot/networkmerge/releases/download/v0.1/OS_Scotland_Network.geojson"
  rnet_x = geojsonsf::geojson_sf(url_rnet_x)

  url_rnet_y = "https://github.com/nptscot/npt/releases/download/v2024-01-17-23-50/combined_network_tile.geojson"
  rnet_y = geojsonsf::geojson_sf(url_rnet_y)

  rnet_xp = sf::st_transform(rnet_x, "EPSG:27700")
  rnet_yp = sf::st_transform(rnet_y, "EPSG:27700")

  name_list = names(rnet_yp)

  # Initialize an empty list
  funs = list()

  # Loop through each name and assign it a function based on specific conditions
  for (name in name_list) {
    if (name == "geometry") {
      next  # Skip the current iteration
    } else if (name %in% c("gradient", "quietness")) {
      funs[[name]] = mean
    } else {
      funs[[name]] = sum
    }
  }

  # Merge the spatial objects rnet_xp and rnet_yp based on specified parameters
  dist = 20
  angle = 15
  rnet_merged_all = stplanr::rnet_merge(rnet_xp, rnet_yp, dist = dist, segment_length = 10, funs = funs, max_angle_diff = angle) 
JosiahParry commented 9 months ago

That's really odd. I don't know why these off by one errors are occurring. But there's a fairly easy fix on your side you can do.

https://github.com/ropensci/stplanr/blob/d593b9f8edba20c5083a2b8a1319bd99929b7d11/R/linefuns.R#L359-L361

  n_lines <- length(geo)
  # create index ids to grab rows from
  ids <- rep.int(seq_len(n_lines), n_segments)

Instead of doing a fixed number which is where the issue is coming from. You can count the number of LineStrings in the multilinestring and use that to build the index.

res_rsgeo <- rsgeo::line_segmentize(geo, n_segments)
n_index <- lengths(expand_geoms(res_rsgeo))
ids <- rep.int(seq_along(xsamp), n_index)