R-ArcGIS / r-bridge-install

Install the R ArcGIS Tools
Apache License 2.0
188 stars 59 forks source link

arc.write error #108

Open Saadi4469 opened 2 years ago

Saadi4469 commented 2 years ago

I am getting the following error while writing an sf object dataframe to a GDB. The dataframe imported is a feature class from a GDB brought in via the sf package. After some data manipulation I am trying to write the dataframe back into the same GDB as a new feature class/layer.

I read in another post that sf does not support writing a feature class to a gdb so I had to use arcgisbinding. However, when I use sf::st_write to write the df as a shapefile, it works just fine.

The arc.write works on the sample nc dataset in the sf package but does not work on my dataset. I don't know if it matters but my sf object has more than 700,000 rows and 82 columns.

How can I fix this?

Error

Error in .call_proxy("arc_write", path, pairlist(data = data, coords = coords,  : 
  insert row failed

Code

 library(sf)
    library(arcgisbinding)

 arc.check_product()

product: ArcGIS Pro (12.8.0.29751)
license: Advanced
version: 1.0.1.244 

    arc.write("path/GDB.gdb/Feature_Class_Name", data = df, overwrite = TRUE)
Saadi4469 commented 2 years ago

Also, it would be really helpful to add a arcgisbindings tag on stackoverflow.

orhuna commented 2 years ago

@Saadi4469 there can be numerous causes to this issue. Here are some thoughts:

  1. Missing values containing special characters
  2. Reserved characters in row entries
  3. Missing crs (what do you get when you run st_crs on your df?)

If possible, sharing a few rows of your data will allow me to provide more direct help.

Saadi4469 commented 2 years ago

@orhuna how can I attach a zipped shapefile?

orhuna commented 2 years ago

@Saadi4469 the easiest way would be emailing it to oaydin@esri.com. If the file is too big, I can set up a OneDrive folder.

scdub commented 2 years ago

@Saadi4469 If the file is smaller, you can also make a .zip archive and attach it as a comment here

Saadi4469 commented 2 years ago

@orhuna I have sent you an email with the shapefile, thank you.

Saadi4469 commented 2 years ago

@orhuna please note that the columns showing NAs in the sample_df are not completely empty in the original dataset.

orhuna commented 2 years ago

@Saadi4469 I tried writing the dataset to all fgdbs we support using R v 4.1.3 with arcgisbinding v244:

arc.write writes the data frame you shared into these without any issues. Below is the code:

library(arcgisbinding)
library(sf)

arc.check_product()
root.dir <- "<my_path_to_data>"
dir <- file.path(root.dir,"sample_df.shp")

#### Reading Your DF into a SF Data Frame ####
sf_data <- st_read(dir)

arc.write(<gdb_loc>, sf_data, overwrite=T)
Saadi4469 commented 2 years ago

@orhuna thank you, could the big dataset be an issue then?

Saadi4469 commented 2 years ago

Also is arcgisbinding v244 same as 1.0.1.244?

Saadi4469 commented 2 years ago

@orhuna I am beginning to think that the dataset is so big that that might be causing the issue.

orhuna commented 2 years ago

@Saadi4469 the data volume is not the issue as the cursor that inserts data from your R dataframe to the feature class in the GDB throws an error and exits.

I think we can debug this further and get you going with your workflow. Below are some checks on the R side, that I recommend you perform on your R data frame:

Steps to Take on the R/R Studio side

Checking for Different Types of missing values

I recommend applying the below separately on the dataset to investigate if there are more than one type of missing data. Where we can handle nas in writint NaN and infinite values need to be represented with a place holder such as -999

Would you please check for different types of missing/out of bound values with is.na , is.infinite, is.nan ?

Note that if you wanted to assign a value for any na, nan or inifinte you can use the following pattern is.na(x) <- value

Trying to find the problematic row

I suspect that you have one row that causes a problem in writing. After trying the subset you given me, I was able to write it out successfully. Here is what I recommend:

Create a for loop that loops through only up to a row and try to write it.

library(arcgisbinding)
library(sf)

arc.check_product()
root.dir <- <your data loc>

write.dir <- file.path(root.dir, 'gdb_current.gdb')
dir <- file.path(root.dir,"sample_df.shp")

#### Reading Your DF into a SF Data Frame ####
sf_data <- st_read(dir)
n <- dim(sf_data)[1]
for (row in seq(2, n))
{
  data_write <- sf_data[1:row,]
  arc.write(write.dir, data_write, overwrite=T)
}

Note that this takes a minute with the 10 rows you gave me and you may want to search for the row may be first with a jump of 100 and then a jump of 10 etc. Searching every row range one by one may take a while for 800,000 rows.

Steps to Take on the ArcGIS Pro/Desktop side

Creating a New GDB

We see write errors like to one you encountered on corrupt gdbs. One step I recommend is running the Create File Geodatabase tool to create a new gdb to write into. There were times where this solved pretty complex problems.

Visualizing the feature class that resulted in the error

This step might be a replacement for the for loop I recommended above. Here you can, simply run your code that results in the error. Then in Pro Go to Catalog, right click on the gdb you wrote to and click refresh. If you can see the feature class there add it to your map and look at the Attribute Table. If the Attribute Table is populated the bottom-most row will be the row right before the problematic row that you can either remove or fix in R.

I hope this helps and please reach out if it does not.

Saadi4469 commented 2 years ago

@orhuna thank you for the detailed guidance.

orhuna commented 2 years ago

@Saadi4469 no problem. THe last issue that I can think of is that of types. Would you share the output of this function on your data frame, df?

sapply(df, class)
Saadi4469 commented 2 years ago

@orhuna on a side note then how come st_write is able to save the same df as a shapefile without any errors?

Saadi4469 commented 2 years ago

@Saadi4469 no problem. THe last issue that I can think of is that of types. Would you share the output of this function on your data frame, df?

sapply(df, class)

Here you go.


$FOLIO
[1] "character"

$USE_CODE [1] "character"

$JUST_LAND_VALUE [1] "integer"

$JUST_BUILDING_VALUE [1] "integer"

$LY_JUSTVAL [1] "integer"

$fsid [1] "integer"

$long [1] "numeric"

$lat [1] "numeric"

$low_depth_002_year00 [1] "character"

$mid_depth_002_year00 [1] "character"

$high_depth_002_year00 [1] "character"

$low_depth_005_year00 [1] "character"

$mid_depth_005_year00 [1] "integer"

$high_depth_005_year00 [1] "integer"

$low_depth_020_year00 [1] "integer"

$mid_depth_020_year00 [1] "integer"

$high_depth_020_year00 [1] "integer"

$low_depth_100_year00 [1] "integer"

$mid_depth_100_year00 [1] "integer"

$high_depth_100_year00 [1] "integer"

$low_depth_500_year00 [1] "integer"

$mid_depth_500_year00 [1] "integer"

$high_depth_500_year00 [1] "integer"

$low_depth_002_year30 [1] "character"

$mid_depth_002_year30 [1] "character"

$high_depth_002_year30 [1] "character"

$low_depth_005_year30 [1] "character"

$mid_depth_005_year30 [1] "integer"

$high_depth_005_year30 [1] "integer"

$low_depth_020_year30 [1] "integer"

$mid_depth_020_year30 [1] "integer"

$high_depth_020_year30 [1] "integer"

$low_depth_100_year30 [1] "integer"

$mid_depth_100_year30 [1] "integer"

$high_depth_100_year30 [1] "integer"

$low_depth_500_year30 [1] "integer"

$mid_depth_500_year30 [1] "integer"

$high_depth_500_year30 [1] "integer"

$low_chance_00_year00 [1] "character"

$mid_chance_00_year00 [1] "character"

$high_chance_00_year00 [1] "character"

$low_chance_15_year00 [1] "numeric"

$mid_chance_15_year00 [1] "numeric"

$high_chance_15_year00 [1] "numeric"

$low_chance_30_year00 [1] "numeric"

$mid_chance_30_year00 [1] "numeric"

$high_chance_30_year00 [1] "numeric"

$low_chance_00_year30 [1] "character"

$mid_chance_00_year30 [1] "character"

$high_chance_00_year30 [1] "character"

$low_chance_15_year30 [1] "numeric"

$mid_chance_15_year30 [1] "numeric"

$high_chance_15_year30 [1] "numeric"

$low_chance_30_year30 [1] "numeric"

$mid_chance_30_year30 [1] "numeric"

$high_chance_30_year30 [1] "numeric"

$aal_year00_low [1] "numeric"

$aal_year00_mid [1] "numeric"

$aal_year00_high [1] "numeric"

$aal_year30_low [1] "numeric"

$aal_year30_mid [1] "numeric"

$aal_year30_high [1] "numeric"

$adapt_id [1] "integer"

$adapt_name [1] "character"

$adapt_rp [1] "integer"

$adapt_type [1] "character"

$ZIPCODE [1] "integer"

$CITYNAME [1] "character"

$DISTRICTID [1] "integer"

$TRACTCE20 [1] "character"

$BLKGRPCE20 [1] "character"

$GEOID20 [1] "character"

$MEDIANHOUSEHOLDINCOMEESTIMATE [1] "numeric"

$FEMA2020 [1] "character"

$FEMA2014 [1] "character"

$ZONING [1] "character"

$Shape_Length [1] "numeric"

$Shape_Area [1] "numeric"

$Shape [1] "sfc_MULTIPOLYGON" "sfc"

$Variable [1] "character"

$Value [1] "integer"

orhuna commented 2 years ago

@Saadi4469 thank you. As per your question on why shapefile works, shapefiles are not subject to stringent value and geometry checks that feature classes are subject to. This means it is easy to export to a .shp but you may end up with problematic geometries in terms of their description (vertices) and/or relationships (such as coincidence, holes, etc.). Also please double check to make sure that the geometries and the values make sense. For shapefiles, if it cannot output a column it silently drops it whereas the arc object throws the error that you saw.

Thank you for the column types. I do see some issues in the column types. I do not know details of your dataset but some columns that should be the same type are different for instance high_chance_00_year00 is a character type whereas low_chance_15_year00 is a numeric type. Would you spot check some of these columns that are inconsistent? If R converts the whole column to a character that might mean the somewhere there may be an inconsistent row value at that column such as , insteda of . or some other character.

An easy solution here would be casting. You can set the types of the columns explicitly before writing it out.

orhuna commented 2 years ago

Another solution we can try is if you have shapefile you could do the following:

data.obj <- arc.open("full_path_to_shapefile.shp")
data.arc <- arc.select(data.obj)
arc.write("write_path", data.arc)

The above should work with the caveat that your data will be at the mercy of what we could write to a shapefile and the checks that went into creating that shapefile to start with, or lack thereof.

orhuna commented 2 years ago

@Saadi4469 Does the issue still persist?

Saadi4469 commented 2 years ago

@orhuna thank you for the follow up, I am waiting for an updated dataset that will also address the column datatype issues. I will get back to you once I have it. Cheers!