Closed Robinlovelace closed 6 years ago
Hi @Robinlovelace - thanks for the report, and the reprex. I have definitely run into issues with rmapshaper being unable to handle really large spatial objects, but in this case I had no trouble (See below). Can you output your sessionInfo()
?
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_ua_caswa_2001_clipped' from data source `/private/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T/RtmpC0DMCy/england_ua_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 1061 features and 5 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: 243367 ymin: 50322 xmax: 595739.3 ymax: 537152
#> epsg (SRID): NA
#> proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#> user system elapsed
#> 12.505 1.409 14.181
object.size(res)
#> 27084984 bytes
object.size(cas2003_simple)
#> 2205128 bytes
plot(cas2003_simple[, "name"])
Also related to #59
Apologies, I provided the smaller of the 2 .shp files (not tested). Please try on this bigger one:
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_caswa_2001_clipped.zip"
Ah, that is a different beast. It did complete for me, but it was definitely slow:
system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#> user system elapsed
#> 1740.608 64.060 1802.608
I'll see if I can find anything that can do better, but the bottleneck is converting to geojson before sending it into the V8 context to be processed by the mapshaper javascript library.
Could there not be an option to do it via read/write and system()
?
I recall doing a bodge involving that (I think you helped me with it!) and it was pretty fast.
There could for sure, I’ve definitely thought about it. Would require some thought about how to make sure the system mapshaper
API is consistent with the commands generated in rmapshaper
.
I'm working on optionally using the system mapshaper
in the mapshaper_v0.4.x
branch, along with upgrading to the latest version of mapshaper
.
Great news - let me know when it's ready to test and I can do some benchmarks. For 9 in 10 use cases it's unlikely to make any difference so 100% behind it being an option for people handling chunky files. Thanks loads!
Hi @Robinlovelace - I've finally got around to implementing this. If you are willing to give it a bit of a test drive, that would be great! You can install_github("ateucher/rmapshaper", ref = "mapshaper_v0.4.x")
. The argument is sys
and is available in all functions. Default is FALSE
, so if you set it to TRUE
it should work!
Fantastic work @ateucher this is 4 times faster on the smaller of the examples - will make life easier for ppl wanting to simplify giant vector datasets. Can test on larger dataset at some point but from sketchy wifi connection on laptop, it expectations! Reprex showing the speed-up:
devtools::install_github("ateucher/rmapshaper", ref = "mapshaper_v0.4.x")
#> Skipping install of 'rmapshaper' from a github remote, the SHA1 (d197a3b7) has not changed since last install.
#> Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_ua_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_ua_caswa_2001_clipped' from data source `/tmp/RtmpiO2NE3/england_ua_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 1061 features and 5 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: 243367 ymin: 50322 xmax: 595739.3 ymax: 537152
#> epsg (SRID): NA
#> proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple <- rmapshaper::ms_simplify(input = res, keep = 0.05))
#> user system elapsed
#> 34.572 2.196 36.980
system.time(cas2003_simple_sys <- rmapshaper::ms_simplify(input = res, keep = 0.05,
sys = T))
#> user system elapsed
#> 8.996 0.208 9.154
object.size(res)
#> 27084984 bytes
object.size(cas2003_simple)
#> 2205128 bytes
identical(cas2003_simple, cas2003_simple_sys)
#> [1] TRUE
plot(cas2003_simple[, "name"])
Excellent! I ran it with the big version and this is what I get:
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
library(rmapshaper)
u = "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_caswa_2001_clipped.zip"
download.file(u, destfile = "zipped_shapefile.zip")
unzip("zipped_shapefile.zip")
f = list.files(pattern = ".shp")
res = sf::st_read(f)
#> Reading layer `england_caswa_2001_clipped' from data source `/private/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T/RtmpZLkA8z/england_caswa_2001_clipped.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 6930 features and 5 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: 85665 ymin: 7054 xmax: 655604 ymax: 657534.1
#> epsg (SRID): NA
#> proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
system.time(cas2003_simple_sys <- ms_simplify(input = res, keep = 0.05, sys = TRUE))
#> user system elapsed
#> 57.382 2.670 59.937
system.time(cas2003_simple_internal <- ms_simplify(input = res, keep = 0.05, sys = FALSE))
#> user system elapsed
#> 1276.978 60.552 1514.478
object.size(cas2003_simple_sys)
#> 16266760 bytes
object.size(cas2003_simple_internal)
#> 16266760 bytes
Still not super fast but much better; the bottleneck is writing to geojson with sf::st_write
...
Great - exceeds expectations! One idea you may have already tried is writing to different formats. I think this is a great solution in any case, many thanks for the fix.
@Robinlovelace rmapshaper 0.4 is now on CRAN with the sys
argument available in all functions.
Fantastic, many thanks! Heads-up @Nowosad we should mention this in the simplification section of the book.
I've just been using this on some moderately large shapfiles.
Here's some reproducible code to demo the issue:
Context: I'm frustrated at the poor provision of UK administrative borders in suitable file formats or levels of simplification and have started a package to deal with it:
https://github.com/Robinlovelace/ukborders
Thanks loads for rmapshaper in any case!