R-ArcGIS / r-bridge

Bridge library to connect ArcGIS and R, including arcgisbinding R library.
Apache License 2.0
118 stars 26 forks source link

Add option to arc.write to specify the OBJECTID/unique identifier field. #58

Open jacpete opened 3 years ago

jacpete commented 3 years ago

Description This is a feature request to allow the specification of the OBJECTID field when using arc.write(). Currently a new field called OBJECTID is added every time you save a new feature. If you already have an OBJECTID field in the dataset it renames it to OBJECTID_1 and still creates a new OBJECTID field. I would hope for an additional parameter in arc.write() named something like object_id where you could specify a field name in your dataset to be used as the unique row identifier. The argument would take a field name as a string and would then do checks to ensure that the specified name exists in data, the data type in the column is integer or can be coerced to integer (if it is numeric), and that each row has a unique value. If any of these checks fail it could give a warning and default to creating the OBJECTID field (as it is currently) or alternatively error out is a message about what check failed. The default for this new variable could be NULL which could be set to mimic the current action for backwards compatibility. This would allow you to specify a current field name like fid, or OBJECTID as the unique identifier without forcing the creation of a new column in the dataset.

Example of Current Action

# install.packages('sf')
# install.packages('spData')
# install.packages('dplyr')
# remotes::install_github("R-ArcGIS/r-bridge@*release")

library(sf)
##Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(spData)
library(dplyr)
library(arcgisbinding)

arcgisbinding::arc.check_product()
## product: ArcGIS Pro (12.8.0.29751)
## license: Advanced
## version: 1.0.1.243

#Create Example data
world <- spData::world

#Create unique ID that we want with the data.
output <- sf::st_sf(cbind(data.frame('OBJECTID' = as.integer(1:nrow(world))), world))
output <- dplyr::arrange(output, iso_a2)
head(output); tail(output)
## Simple feature collection with 6 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 42.68825
## Geodetic CRS:  WGS 84
##   OBJECTID iso_a2            name_long  continent  region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 1       85     AE United Arab Emirates       Asia       Asia    Western Asia Sovereign country    79880.74  9070867  76.948 63943.186 MULTIPOLYGON (((51.57952 24...
## 2      104     AF          Afghanistan       Asia       Asia   Southern Asia Sovereign country   652270.07 32758020  62.895  1838.960 MULTIPOLYGON (((66.51861 37...
## 3      126     AL              Albania     Europe     Europe Southern Europe Sovereign country    29694.80  2889104  77.963 10701.121 MULTIPOLYGON (((21.02004 40...
## 4      110     AM              Armenia       Asia       Asia    Western Asia Sovereign country    28656.60  2906220  74.255  7971.118 MULTIPOLYGON (((46.50572 38...
## 5       75     AO               Angola     Africa     Africa   Middle Africa Sovereign country  1245463.75 26920466  60.858  6257.153 MULTIPOLYGON (((12.32243 -6...
## 6      160     AQ           Antarctica Antarctica Antarctica      Antarctica     Indeterminate 12335956.08       NA      NA        NA MULTIPOLYGON (((-180 -89.9,...
## Simple feature collection with 6 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 16.34498 ymin: -34.81917 xmax: 53.10857 ymax: 35.6716
## Geodetic CRS:  WGS 84
##     OBJECTID iso_a2       name_long continent region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 172      158     YE           Yemen      Asia      Asia    Western Asia Sovereign country  455915.007 26246327  64.523  3766.805 MULTIPOLYGON (((52.00001 19...
## 173       26     ZA    South Africa    Africa    Africa Southern Africa Sovereign country 1216400.831 54539571  60.993 12389.715 MULTIPOLYGON (((16.34498 -2...
## 174       71     ZM          Zambia    Africa    Africa  Eastern Africa Sovereign country  751921.215 15620974  60.775  3632.504 MULTIPOLYGON (((30.74001 -8...
## 175       49     ZW        Zimbabwe    Africa    Africa  Eastern Africa Sovereign country  376328.489 15411675  59.360  1925.139 MULTIPOLYGON (((31.19141 -2...
## 176      161   <NA> Northern Cyprus      Asia      Asia    Western Asia Sovereign country    3786.365       NA      NA        NA MULTIPOLYGON (((32.73178 35...
## 177      168   <NA>      Somaliland    Africa    Africa  Eastern Africa     Indeterminate  167349.613       NA      NA        NA MULTIPOLYGON (((48.9482 11....

tempPath <- tempdir()
tempFile <- file.path(tempPath, "example.gdb", "world")
arcgisbinding::arc.write(tempFile, data = output, validate = TRUE, overwrite = TRUE)

world_read <- arc.data2sf(arc.select(arc.open(tempFile)))
head(world_read); tail(world_read)
## Simple feature collection with 6 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 42.68825
## CRS:           +proj=longlat +datum=WGS84 +no_defs
##   OBJECTID OBJECTID_1 iso_a2            name_long  continent  region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 1        1         85     AE United Arab Emirates       Asia       Asia    Western Asia Sovereign country    79880.74  9070867  76.948 63943.186 POLYGON ((55.98121 24.13054...
## 2        2        104     AF          Afghanistan       Asia       Asia   Southern Asia Sovereign country   652270.07 32758020  62.895  1838.960 POLYGON ((71.54192 37.90577...
## 3        3        126     AL              Albania     Europe     Europe Southern Europe Sovereign country    29694.80  2889104  77.963 10701.121 POLYGON ((19.73805 42.68825...
## 4        4        110     AM              Armenia       Asia       Asia    Western Asia Sovereign country    28656.60  2906220  74.255  7971.118 POLYGON ((45.61001 39.89999...
## 5        5         75     AO               Angola     Africa     Africa   Middle Africa Sovereign country  1245463.75 26920466  60.858  6257.153 MULTIPOLYGON (((16.32653 -5...
## 6        6        160     AQ           Antarctica Antarctica Antarctica      Antarctica     Indeterminate 12335956.08       NA      NA        NA MULTIPOLYGON (((-59.57209 -...
## Simple feature collection with 6 features and 12 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 16.34498 ymin: -34.81917 xmax: 53.10857 ymax: 35.6716
## CRS:           +proj=longlat +datum=WGS84 +no_defs
##     OBJECTID OBJECTID_1 iso_a2       name_long continent region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 172      172        158     YE           Yemen      Asia      Asia    Western Asia Sovereign country  455915.007 26246327  64.523  3766.805 POLYGON ((52.78218 17.34974...
## 173      173         26     ZA    South Africa    Africa    Africa Southern Africa Sovereign country 1216400.831 54539571  60.993 12389.715 MULTIPOLYGON (((29.83904 -2...
## 174      174         71     ZM          Zambia    Africa    Africa  Eastern Africa Sovereign country  751921.215 15620974  60.775  3632.504 POLYGON ((33.11429 -11.6072...
## 175      175         49     ZW        Zimbabwe    Africa    Africa  Eastern Africa Sovereign country  376328.489 15411675  59.360  1925.139 POLYGON ((30.27426 -15.5077...
## 176      176        161   <NA> Northern Cyprus      Asia      Asia    Western Asia Sovereign country    3786.365       NA      NA        NA POLYGON ((34.57647 35.6716,...
## 177      177        168   <NA>      Somaliland    Africa    Africa  Eastern Africa     Indeterminate  167349.613       NA      NA        NA POLYGON ((44.1178 10.44554,...

Example of how new parameter would function This is an theoretical example and will not run.

# install.packages('sf')
# install.packages('spData')
# install.packages('dplyr')
# remotes::install_github("R-ArcGIS/r-bridge@*release")

library(sf)
##Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(spData)
library(dplyr)
library(arcgisbinding)

arcgisbinding::arc.check_product()
## product: ArcGIS Pro (12.8.0.29751)
## license: Advanced
## version: 1.0.1.243

#Create Example data
world <- spData::world

#Create unique ID that we want with the data.
output <- sf::st_sf(cbind(data.frame('OBJECTID' = as.integer(1:nrow(world))), world))
output <- dplyr::arrange(output, iso_a2)
head(output); tail(output)
## Simple feature collection with 6 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 42.68825
## Geodetic CRS:  WGS 84
##   OBJECTID iso_a2            name_long  continent  region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 1       85     AE United Arab Emirates       Asia       Asia    Western Asia Sovereign country    79880.74  9070867  76.948 63943.186 MULTIPOLYGON (((51.57952 24...
## 2      104     AF          Afghanistan       Asia       Asia   Southern Asia Sovereign country   652270.07 32758020  62.895  1838.960 MULTIPOLYGON (((66.51861 37...
## 3      126     AL              Albania     Europe     Europe Southern Europe Sovereign country    29694.80  2889104  77.963 10701.121 MULTIPOLYGON (((21.02004 40...
## 4      110     AM              Armenia       Asia       Asia    Western Asia Sovereign country    28656.60  2906220  74.255  7971.118 MULTIPOLYGON (((46.50572 38...
## 5       75     AO               Angola     Africa     Africa   Middle Africa Sovereign country  1245463.75 26920466  60.858  6257.153 MULTIPOLYGON (((12.32243 -6...
## 6      160     AQ           Antarctica Antarctica Antarctica      Antarctica     Indeterminate 12335956.08       NA      NA        NA MULTIPOLYGON (((-180 -89.9,...
## Simple feature collection with 6 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 16.34498 ymin: -34.81917 xmax: 53.10857 ymax: 35.6716
## Geodetic CRS:  WGS 84
##     OBJECTID iso_a2       name_long continent region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 172      158     YE           Yemen      Asia      Asia    Western Asia Sovereign country  455915.007 26246327  64.523  3766.805 MULTIPOLYGON (((52.00001 19...
## 173       26     ZA    South Africa    Africa    Africa Southern Africa Sovereign country 1216400.831 54539571  60.993 12389.715 MULTIPOLYGON (((16.34498 -2...
## 174       71     ZM          Zambia    Africa    Africa  Eastern Africa Sovereign country  751921.215 15620974  60.775  3632.504 MULTIPOLYGON (((30.74001 -8...
## 175       49     ZW        Zimbabwe    Africa    Africa  Eastern Africa Sovereign country  376328.489 15411675  59.360  1925.139 MULTIPOLYGON (((31.19141 -2...
## 176      161   <NA> Northern Cyprus      Asia      Asia    Western Asia Sovereign country    3786.365       NA      NA        NA MULTIPOLYGON (((32.73178 35...
## 177      168   <NA>      Somaliland    Africa    Africa  Eastern Africa     Indeterminate  167349.613       NA      NA        NA MULTIPOLYGON (((48.9482 11....

tempPath <- tempdir()
tempFile <- file.path(tempPath, "example.gdb", "world")
#arc.write with the optional object_id argument pointing to "OBJECTID"
arcgisbinding::arc.write(tempFile, data = output, validate = TRUE, overwrite = TRUE, object_id = "OBJECTID")
world_read <- arc.data2sf(arc.select(arc.open(tempFile)))
head(world_read); tail(world_read)
## Simple feature collection with 6 features and 11 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 42.68825
## Geodetic CRS:  WGS 84
##   OBJECTID iso_a2            name_long  continent  region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 1       85     AE United Arab Emirates       Asia       Asia    Western Asia Sovereign country    79880.74  9070867  76.948 63943.186 MULTIPOLYGON (((51.57952 24...
## 2      104     AF          Afghanistan       Asia       Asia   Southern Asia Sovereign country   652270.07 32758020  62.895  1838.960 MULTIPOLYGON (((66.51861 37...
## 3      126     AL              Albania     Europe     Europe Southern Europe Sovereign country    29694.80  2889104  77.963 10701.121 MULTIPOLYGON (((21.02004 40...
## 4      110     AM              Armenia       Asia       Asia    Western Asia Sovereign country    28656.60  2906220  74.255  7971.118 MULTIPOLYGON (((46.50572 38...
## 5       75     AO               Angola     Africa     Africa   Middle Africa Sovereign country  1245463.75 26920466  60.858  6257.153 MULTIPOLYGON (((12.32243 -6...
## 6      160     AQ           Antarctica Antarctica Antarctica      Antarctica     Indeterminate 12335956.08       NA      NA        NA MULTIPOLYGON (((-180 -89.9,...
## Simple feature collection with 6 features and 11 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 16.34498 ymin: -34.81917 xmax: 53.10857 ymax: 35.6716
## Geodetic CRS:  WGS 84
##     OBJECTID iso_a2       name_long continent region_un       subregion              type    area_km2      pop lifeExp gdpPercap                           geom
## 172      158     YE           Yemen      Asia      Asia    Western Asia Sovereign country  455915.007 26246327  64.523  3766.805 MULTIPOLYGON (((52.00001 19...
## 173       26     ZA    South Africa    Africa    Africa Southern Africa Sovereign country 1216400.831 54539571  60.993 12389.715 MULTIPOLYGON (((16.34498 -2...
## 174       71     ZM          Zambia    Africa    Africa  Eastern Africa Sovereign country  751921.215 15620974  60.775  3632.504 MULTIPOLYGON (((30.74001 -8...
## 175       49     ZW        Zimbabwe    Africa    Africa  Eastern Africa Sovereign country  376328.489 15411675  59.360  1925.139 MULTIPOLYGON (((31.19141 -2...
## 176      161   <NA> Northern Cyprus      Asia      Asia    Western Asia Sovereign country    3786.365       NA      NA        NA MULTIPOLYGON (((32.73178 35...
## 177      168   <NA>      Somaliland    Africa    Africa  Eastern Africa     Indeterminate  167349.613       NA      NA        NA MULTIPOLYGON (((48.9482 11....
orhuna commented 3 years ago

@jacpete thank you for the detailed feature request.

As per the need for this enhancement, are there any workflows that require you to have specific IDs in the OBJECTID field?

Our current mode of operation is creating these from scratch to make sure that display, navigation, and Geoprocessing functionalities work on the data seamlessly. Allowing a hard-coded Object ID exposes the output feature class to corruption and may make the ArcGIS Pro analysis on this output feature class impossible unless you recreate Object IDs. Here are our specs for ObjectIDs and their functions.

jacpete commented 3 years ago

I read through the linked documentation and I think the functionality I would like to implement is the highlighted bullet in the screenshot below:

image

To do this we would add a fourth check that I forgot in my original post to ensure that the field has no NULL/NA values. Natively R's integer fields are already 32-bit so coercion to the integer class would handle that.

My main issue is that it creates a new field when its not needed and changes the original unique identifier field to a new name so if a user like me loaded back in the data and wanted to do a join on OBJECTID they would get incorrect values because they should be joining on OBJECTID_1.

I know the arcpy/arcgis packages in python handle this better and won't create a new OBJECTID field in the same circumstance (Example python code below). The python code is handling the designated OID field correctly the entire time. In the example below I am pulling and saving data from a REST Service (https://services.arcgis.com/V6ZHFr6zdgNZuVG0/arcgis/rest/services/Landscape_Trees/FeatureServer/0) using both the arcpy/arcgis packages in Python and the arcgisbindings package in R. The OID field is named FID.

Example

I am including a reproducible example using both Python and R. Running the code will save the data to a new GeoDatabase called testTrees.gdb at the root of your C drive. Feel free to change the path if needed.

Python

import os
import re

import arcpy
import arcpy.management
import arcgis.features

def getFeatureSet(url, where = '1=1', fields = "*", objectIDs = None):
    fl = arcgis.features.FeatureLayer(url)
    if type(objectIDs) is list:
        objectIDs = ','.join([str(ID) for ID in objectIDs])
    fs = fl.query(where=where, out_fields=fields, return_geometry=True, object_ids= objectIDs)
    return fs

def checkGdbExits(output):
    output = os.path.normpath(output)
    if re.search('.gdb', output) is not None:
        pathParts = os.path.normpath(output).split(os.sep)
        gdbID = list(map(lambda x: re.search('.gdb', x) is not None, pathParts)).index(True)
        gdbPath = os.sep.join(pathParts[:gdbID+1])
        if not os.path.exists(gdbPath):
            arcpy.management.CreateFileGDB(out_folder_path=os.path.split(gdbPath)[0], out_name=os.path.split(gdbPath)[1])
    return output

def saveFeatureSet(fs, output):
    #Make sure output .gdb exists if output path has .gdb
    output = checkGdbExits(output)

    #Save feature set
    fs.save(save_location=os.path.split(output)[0], out_name=os.path.split(output)[1])

def scrapeESRIServiceLayer(url, output, where = "1=1", fields = "*", objectIDs = None):

    #Retrieve feature set
    fs = getFeatureSet(url = url, where = where, fields = fields, objectIDs=objectIDs)

    #Save feature set
    saveFeatureSet(fs, output)

scrapeESRIServiceLayer(
    url = "https://services.arcgis.com/V6ZHFr6zdgNZuVG0/arcgis/rest/services/Landscape_Trees/FeatureServer/0", 
    output = "C:\\testTrees.gdb\\trees_arcpy"
)

R

And then run the R code which also shows the difference between the files

library(dplyr)
library(arcgisbinding)

arcgisbinding::arc.check_product()
## product: ArcGIS Pro (12.8.0.29751)
## license: Advanced
## version: 1.0.1.243

trees <- arcgisbinding::arc.select(arcgisbinding::arc.open("https://services.arcgis.com/V6ZHFr6zdgNZuVG0/arcgis/rest/services/Landscape_Trees/FeatureServer/0"))
arcgisbinding::arc.write("C:/testTrees.gdb/trees_arcgisbindings", data = trees, validate = TRUE, overwrite = TRUE)

#Load data in to look at the difference
trees_arcgisbindings <- arc.data2sf(arc.select(arc.open("C:/testTrees.gdb/trees_arcgisbindings")))
trees_arcpy <- arc.data2sf(arc.select(arc.open("C:/testTrees.gdb/trees_arcpy")))

#What columns are different
names(trees_arcgisbindings)[!names(trees_arcgisbindings) %in% names(trees_arcpy)]
## [1] "OBJECTID"
names(trees_arcpy)[!names(trees_arcpy) %in% names(trees_arcgisbindings)]
## character(0)

#Print the first few columns
dplyr::select(trees_arcgisbindings, 1:5)
## Simple feature collection with 1148 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -9177809 ymin: 4247005 xmax: -9176814 ymax: 4247759
## Projected CRS: WGS 84 / Pseudo-Mercator
## First 10 features:
##     OBJECTID FID Tree_ID           Collected                    Crew                     geom
## 1         1   1     102 2012-10-04 19:00:00 Linden+ Forrest+ Johnny POINT (-9177312 4247151)
## 2         2   2     103 2012-10-04 19:00:00 Linden+ Forrest+ Johnny POINT (-9177303 4247155)
## 3         3   3     104 2012-10-04 19:00:00 Linden+ Forrest+ Johnny POINT (-9177382 4247204)
## 4         4   4     105 2012-10-04 19:00:00 Linden+ Forrest+ Johnny POINT (-9177390 4247219)
## 5         5   5     107 2012-10-07 19:00:00       Linden+ Adele+ Ed POINT (-9177392 4247235)
## 6         6   6     108 2012-10-07 19:00:00       Linden+ Adele+ Ed POINT (-9177406 4247253)
## 7         7   7     109 2012-10-07 19:00:00       Linden+ Adele+ Ed POINT (-9177411 4247257)
## 8         8   8     110 2012-10-07 19:00:00       Linden+ Adele+ Ed POINT (-9177415 4247255)
## 9         9   9     111 2012-10-07 19:00:00       Linden+ Adele+ Ed POINT (-9177401 4247271)
## 10       10  10     112 2012-10-09 19:00:00      Linden+ Joe+ Casey POINT (-9177416 4247277)
dplyr::select(trees_arcpy, 1:5)
## Simple feature collection with 1148 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -9177809 ymin: 4247005 xmax: -9176814 ymax: 4247759
## Projected CRS: WGS 84 / Pseudo-Mercator
## First 10 features:
##     FID Tree_ID           Collected                    Crew Status                     geom
## 1    1     102 2012-10-04 19:00:00 Linden+ Forrest+ Johnny      P POINT (-9177312 4247151)
## 2    2     103 2012-10-04 19:00:00 Linden+ Forrest+ Johnny      P POINT (-9177303 4247155)
## 3    3     104 2012-10-04 19:00:00 Linden+ Forrest+ Johnny      P POINT (-9177382 4247204)
## 4    4     105 2012-10-04 19:00:00 Linden+ Forrest+ Johnny      P POINT (-9177390 4247219)
## 5    5     107 2012-10-07 19:00:00       Linden+ Adele+ Ed      P POINT (-9177392 4247235)
## 6    6     108 2012-10-07 19:00:00       Linden+ Adele+ Ed      U POINT (-9177406 4247253)
## 7    7     109 2012-10-07 19:00:00       Linden+ Adele+ Ed      U POINT (-9177411 4247257)
## 8    8     110 2012-10-07 19:00:00       Linden+ Adele+ Ed      P POINT (-9177415 4247255)
## 9    9     111 2012-10-07 19:00:00       Linden+ Adele+ Ed      I POINT (-9177401 4247271)
## 10  10     112 2012-10-09 19:00:00      Linden+ Joe+ Casey      P POINT (-9177416 4247277)

The Python version will correctly identify the FID column as an ObjectID field and not create a new field called ObjectID while the R version will. While my suggestion for functionality wouldn't automatically prevent the creation of the new ObjectID field, it would allow a user to explicitly request FID to become the ObjectID field in the geodatabase with a command like:

trees <- arcgisbinding::arc.select(arcgisbinding::arc.open("https://services.arcgis.com/V6ZHFr6zdgNZuVG0/arcgis/rest/services/Landscape_Trees/FeatureServer/0"))
arcgisbinding::arc.write("C:/testTrees.gdb/trees_arcgisbindings", data = trees, validate = TRUE, overwrite = TRUE, object_id = "FID")
JWilliamsonArch commented 1 month ago

Has there been any progress on this? Or has it already been fixed?

I recently had an issue with OBJECTIDs and a legacy database, and if this had been an option, I would have been able to complete the workflow in R with the arcgisbinding package.