brownag / gpkg

Utilities for the Open Geospatial Consortium (OGC) 'GeoPackage' Format in R
http://humus.rocks/gpkg/
Creative Commons Zero v1.0 Universal
18 stars 0 forks source link

Implement GDAL driver detection via {vapour} #16

Closed brownag closed 6 months ago

brownag commented 6 months ago

This implements better handling of file paths to detect the necessary GDAL drivers. It is able to distinguish between raster and vector in most cases.

To be consistent with prior behavior, for now, the GDAL CSV driver is ignored as a possible vector source (used for non-spatial attributes only) and the GPKG driver is only used as a vector source.

There are several other drivers (which previously did not work as their extensions were not in the hard coded list) which can serve as both vector and raster sources.

drv <- vapour::vapour_all_drivers()
drv$driver[drv$vector & drv$raster]
#> [1] "FITS"        "PCIDSK"      "netCDF"      "PDS4"        "VICAR"       "JP2OpenJPEG"
#> [7] "PDF"         "MBTiles"     "BAG"         "OGCAPI"      "GPKG"        "OpenFileGDB"
#> [13] "CAD"         "PLSCENES"    "NGW"         "HTTP"       

There may need to be some specific handling of the above, and then the set of decisions documented in the documentation Details. The option is always available to read from the source using the correct format before passing to gpkg_write(), this would only affect file path source driver detection.

brownag commented 6 months ago

Example of {terra} behavior reading/writing CSV vector sources. I think this could potentially be "fixed" in {terra} but CSV is really not a good storage medium for vector data so I do not think I should suggest it. I think it is probably best to just treat CSV files as "attributes only" for the purposes of gpkg_write().

library(terra)
#> terra 1.7.73

x <- vect(system.file("ex", "lux.shp", package="terra"))

write.csv(as.data.frame(x), "test.csv", row.names = FALSE)
a <- vect("test.csv")
a
#>  class       : SpatVector 
#>  geometry    : none 
#>  dimensions  : 0, 6  (geometries, attributes)
#>  extent      : 0, 0, 0, 0  (xmin, xmax, ymin, ymax)
#>  source      : test.csv
#>  coord. ref. :  
#>  names       :  ID_1 NAME_1  ID_2 NAME_2  AREA   POP
#>  type        : <chr>  <chr> <chr>  <chr> <chr> <chr>

write.csv(as.data.frame(x, geom = "WKT"), "test.csv", row.names = FALSE)
b <- vect("test.csv")
b
#>  class       : SpatVector 
#>  geometry    : none 
#>  dimensions  : 0, 7  (geometries, attributes)
#>  extent      : 0, 0, 0, 0  (xmin, xmax, ymin, ymax)
#>  source      : test.csv
#>  coord. ref. :  
#>  names       :  ID_1 NAME_1  ID_2 NAME_2  AREA   POP geometry
#>  type        : <chr>  <chr> <chr>  <chr> <chr> <chr>    <chr>

d <- vect(read.csv("test.csv"), geom = "geometry")
d
#>  class       : SpatVector 
#>  geometry    : polygons 
#>  dimensions  : 12, 6  (geometries, attributes)
#>  extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#>  coord. ref. :  
#>  names       :  ID_1   NAME_1  ID_2   NAME_2  AREA   POP
#>  type        : <int>    <chr> <int>    <chr> <int> <int>
#>  values      :     1 Diekirch     1 Clervaux   312 18081
#>                    1 Diekirch     2 Diekirch   218 32543
#>                    1 Diekirch     3  Redange   259 18664

writeVector(x, "test2.csv")
#> Error: [writeVector] cannot guess filetype from filename
brownag commented 6 months ago

I added documentation about CSV and GPKG sources, and some general guidance for how to handle multilayer sources (i.e. read the specific layers you want in to R before attempting to write them to a new GeoPackage). I am not inclined to add any additional handling for sources other than these.

In the future I may may decide to either disallow GPKG in input file path sources, or provide a custom method for GPKG only that allows transfer of multiple tables using the list item name, or file basename, as a prefix for the new table name(s)--but will do that in a separate PR.