Open Saadi4469 opened 2 years ago
Also, it would be really helpful to add a arcgisbindings
tag on stackoverflow
.
@Saadi4469 there can be numerous causes to this issue. Here are some thoughts:
st_crs
on your df?)If possible, sharing a few rows of your data will allow me to provide more direct help.
@orhuna how can I attach a zipped shapefile?
@Saadi4469 the easiest way would be emailing it to oaydin@esri.com. If the file is too big, I can set up a OneDrive folder.
@Saadi4469 If the file is smaller, you can also make a .zip archive and attach it as a comment here
@orhuna I have sent you an email with the shapefile, thank you.
@orhuna please note that the columns showing NAs
in the sample_df
are not completely empty in the original dataset.
@Saadi4469 I tried writing the dataset to all fgdbs we support using R v 4.1.3
with arcgisbinding v244
:
arc.write
writes the data frame you shared into these without any issues. Below is the code:
library(arcgisbinding)
library(sf)
arc.check_product()
root.dir <- "<my_path_to_data>"
dir <- file.path(root.dir,"sample_df.shp")
#### Reading Your DF into a SF Data Frame ####
sf_data <- st_read(dir)
arc.write(<gdb_loc>, sf_data, overwrite=T)
@orhuna thank you, could the big dataset
be an issue
then?
Also is arcgisbinding v244
same as 1.0.1.244
?
@orhuna I am beginning to think that the dataset is so big that that might be causing the issue.
@Saadi4469 the data volume is not the issue as the cursor that inserts data from your R dataframe to the feature class in the GDB throws an error and exits.
I think we can debug this further and get you going with your workflow. Below are some checks on the R side, that I recommend you perform on your R data frame:
I recommend applying the below separately on the dataset to investigate if there are more than one type of missing data. Where we can handle nas in writint NaN and infinite values need to be represented with a place holder such as -999
Would you please check for different types of missing/out of bound values with is.na
, is.infinite
, is.nan
?
Note that if you wanted to assign a value for any na, nan or inifinte you can use the following pattern is.na(x) <- value
I suspect that you have one row that causes a problem in writing. After trying the subset you given me, I was able to write it out successfully. Here is what I recommend:
Create a for loop that loops through only up to a row and try to write it.
library(arcgisbinding)
library(sf)
arc.check_product()
root.dir <- <your data loc>
write.dir <- file.path(root.dir, 'gdb_current.gdb')
dir <- file.path(root.dir,"sample_df.shp")
#### Reading Your DF into a SF Data Frame ####
sf_data <- st_read(dir)
n <- dim(sf_data)[1]
for (row in seq(2, n))
{
data_write <- sf_data[1:row,]
arc.write(write.dir, data_write, overwrite=T)
}
Note that this takes a minute with the 10 rows you gave me and you may want to search for the row may be first with a jump of 100 and then a jump of 10 etc. Searching every row range one by one may take a while for 800,000 rows.
We see write errors like to one you encountered on corrupt gdbs. One step I recommend is running the Create File Geodatabase
tool to create a new gdb to write into. There were times where this solved pretty complex problems.
This step might be a replacement for the for loop I recommended above. Here you can, simply run your code that results in the error. Then in Pro Go to Catalog, right click on the gdb you wrote to and click refresh. If you can see the feature class there add it to your map and look at the Attribute Table. If the Attribute Table is populated the bottom-most row will be the row right before the problematic row that you can either remove or fix in R.
I hope this helps and please reach out if it does not.
@orhuna thank you for the detailed guidance.
@Saadi4469 no problem. THe last issue that I can think of is that of types. Would you share the output of this function on your data frame, df
?
sapply(df, class)
@orhuna on a side note then how come st_write
is able to save the same df
as a shapefile
without any errors
?
@Saadi4469 no problem. THe last issue that I can think of is that of types. Would you share the output of this function on your data frame,
df
?sapply(df, class)
Here you go.
$FOLIO [1] "character"
$USE_CODE [1] "character"
$JUST_LAND_VALUE [1] "integer"
$JUST_BUILDING_VALUE [1] "integer"
$LY_JUSTVAL [1] "integer"
$fsid [1] "integer"
$long [1] "numeric"
$lat [1] "numeric"
$low_depth_002_year00 [1] "character"
$mid_depth_002_year00 [1] "character"
$high_depth_002_year00 [1] "character"
$low_depth_005_year00 [1] "character"
$mid_depth_005_year00 [1] "integer"
$high_depth_005_year00 [1] "integer"
$low_depth_020_year00 [1] "integer"
$mid_depth_020_year00 [1] "integer"
$high_depth_020_year00 [1] "integer"
$low_depth_100_year00 [1] "integer"
$mid_depth_100_year00 [1] "integer"
$high_depth_100_year00 [1] "integer"
$low_depth_500_year00 [1] "integer"
$mid_depth_500_year00 [1] "integer"
$high_depth_500_year00 [1] "integer"
$low_depth_002_year30 [1] "character"
$mid_depth_002_year30 [1] "character"
$high_depth_002_year30 [1] "character"
$low_depth_005_year30 [1] "character"
$mid_depth_005_year30 [1] "integer"
$high_depth_005_year30 [1] "integer"
$low_depth_020_year30 [1] "integer"
$mid_depth_020_year30 [1] "integer"
$high_depth_020_year30 [1] "integer"
$low_depth_100_year30 [1] "integer"
$mid_depth_100_year30 [1] "integer"
$high_depth_100_year30 [1] "integer"
$low_depth_500_year30 [1] "integer"
$mid_depth_500_year30 [1] "integer"
$high_depth_500_year30 [1] "integer"
$low_chance_00_year00 [1] "character"
$mid_chance_00_year00 [1] "character"
$high_chance_00_year00 [1] "character"
$low_chance_15_year00 [1] "numeric"
$mid_chance_15_year00 [1] "numeric"
$high_chance_15_year00 [1] "numeric"
$low_chance_30_year00 [1] "numeric"
$mid_chance_30_year00 [1] "numeric"
$high_chance_30_year00 [1] "numeric"
$low_chance_00_year30 [1] "character"
$mid_chance_00_year30 [1] "character"
$high_chance_00_year30 [1] "character"
$low_chance_15_year30 [1] "numeric"
$mid_chance_15_year30 [1] "numeric"
$high_chance_15_year30 [1] "numeric"
$low_chance_30_year30 [1] "numeric"
$mid_chance_30_year30 [1] "numeric"
$high_chance_30_year30 [1] "numeric"
$aal_year00_low [1] "numeric"
$aal_year00_mid [1] "numeric"
$aal_year00_high [1] "numeric"
$aal_year30_low [1] "numeric"
$aal_year30_mid [1] "numeric"
$aal_year30_high [1] "numeric"
$adapt_id [1] "integer"
$adapt_name [1] "character"
$adapt_rp [1] "integer"
$adapt_type [1] "character"
$ZIPCODE [1] "integer"
$CITYNAME [1] "character"
$DISTRICTID [1] "integer"
$TRACTCE20 [1] "character"
$BLKGRPCE20 [1] "character"
$GEOID20 [1] "character"
$MEDIANHOUSEHOLDINCOMEESTIMATE [1] "numeric"
$FEMA2020 [1] "character"
$FEMA2014 [1] "character"
$ZONING [1] "character"
$Shape_Length [1] "numeric"
$Shape_Area [1] "numeric"
$Shape [1] "sfc_MULTIPOLYGON" "sfc"
$Variable [1] "character"
$Value [1] "integer"
@Saadi4469 thank you. As per your question on why shapefile works, shapefiles are not subject to stringent value and geometry checks that feature classes are subject to. This means it is easy to export to a .shp
but you may end up with problematic geometries in terms of their description (vertices) and/or relationships (such as coincidence, holes, etc.). Also please double check to make sure that the geometries and the values make sense. For shapefiles, if it cannot output a column it silently drops it whereas the arc object throws the error that you saw.
Thank you for the column types. I do see some issues in the column types. I do not know details of your dataset but some columns that should be the same type are different for instance high_chance_00_year00
is a character type whereas low_chance_15_year00
is a numeric type. Would you spot check some of these columns that are inconsistent? If R converts the whole column to a character that might mean the somewhere there may be an inconsistent row value at that column such as ,
insteda of .
or some other character.
An easy solution here would be casting. You can set the types of the columns explicitly before writing it out.
Another solution we can try is if you have shapefile you could do the following:
data.obj <- arc.open("full_path_to_shapefile.shp")
data.arc <- arc.select(data.obj)
arc.write("write_path", data.arc)
The above should work with the caveat that your data will be at the mercy of what we could write to a shapefile and the checks that went into creating that shapefile to start with, or lack thereof.
@Saadi4469 Does the issue still persist?
@orhuna thank you for the follow up, I am waiting for an updated dataset
that will also address
the column
datatype
issues
. I will get back to you once I have it. Cheers!
I am getting the following error while writing an
sf object
dataframe
to aGDB
. The dataframe imported is a feature class from a GDB brought in via thesf
package. After some data manipulation I am trying to write thedataframe
back into the sameGDB
as anew
feature class/layer
.I read in another post that
sf
does not support writing a feature class to agdb
so I had to usearcgisbinding
. However, when I usesf::st_write
to write thedf
as ashapefile
, it works just fine.The
arc.write
works on the samplenc
dataset in thesf
package but does not work on my dataset. I don't know if it matters but mysf object
has more than700,000
rows and82
columns.How can I fix this?
Error
Code