Closed dimfalk closed 1 year ago
OSM_PLZ.shp
: 55,2 MB (on disk)
plz <- sf::read_sf("inst/exdata/PLZ/OSM_PLZ.shp")
: 60,8 MB (in memory)
save(plz, file="plz.RData")
: 36,5 MB (on disk)
saveRDS(plz, file="plz.rds")
: 36,5 MB (on disk)
Volume overhead seems too large, the actual benefit marginal, but the insights can be used for #26.
However, since only centroid coordinates are relevant for point extraction, actual geometries can be dropped.
plz_centroids <- sf::st_centroid(plz)
: 6,0 MB (in memory) | 510 KB (on disk)
Moreover, attribute table can be cleaned for not relevant columns.
plz_minimal <- plz_centroids["plz"]
: 4,1 MB | 161 KB (on disk) ✔️
The sf object consists of 8.725 observations but there are only 8.169 unique entries in the dataset. Caution: There are supposed to be 8.181 unique entries. 12 objects are missing.
Overlap between postal code areas and municipalities? Seems more like a multi-polygon approach (for whatever reason) because attributes of twin objects seem to be identical except for OBJECTID
, Shape_Length
, Shape_Area
and geometry
.
This would require some cleaning - dplyr::group()
? - beforehand.
In addition, it would make sense to use the primary source from OSM.
get_centroid("33699") |> get_idx()
#' "42024"
OSM-based dataset is 55,2 MB in size and therefore cannot be embedded in a package:
https://opendata-esri-de.opendata.arcgis.com/datasets/esri-de-content::postleitzahlengebiete-osm/about https://www.suche-postleitzahl.org/downloads
Save as Rdata or dismiss idea?