Closed matthewrfuller closed 11 months ago
@matthewrfuller Thanks for finding and reporting this issue so quickly. It looks like st_write() is abbreviating the column names because the "netgeometry" column name has more than 10 characters, which is the maximum allowed in a dbf file. Interesting that it also truncates column names that are within the character limit... I've updated ssn_write(), ssn_subset() and ssn_split_predpts() to remove netgeometry before writing to shapefile, which addresses the issue within SSN2. @michaeldumelle - we need to decide whether this fix is sufficient or whether we should shorten the netgeometry column name to conform with those standards: netgeom? n_geometry?
Thanks @matthewrfuller. We changed the name of netgeometry
to netgeom
to avoid exceeding the 10 character limit for column/field names while writing to shapefiles. This fix is available in the development version now (remotes::install_github("USEPA/SSN2", ref = "develop")
) and will included be in the next CRAN release.
Excellent! Thanks for implementing this change in SSN2 so quickly!
I'm wondering if a note in the ssn_write() function documentation should warn/advise users to maintain column/field names at 10 characters or less in both observation and prediction data frames if they'd like to maintain field/column names through the entire write/import process. Otherwise, if just one field exceeds the 10-character limit when writing the SSN object, the ESRI Shapefile driver used by st_write() will abbreviate all column/field names with 8 or more characters. Here's a reprex that demonstrates this behavior after adding a column with an 11-character name (DRAINAGEKM2) to the Introduction vignette's mf04p ssn object observations.
`remotes::install_github("USEPA/SSN2", ref = "develop") library(SSN2)
copy_lsn_to_temp()
temp_path <- paste0(tempdir(),'/MiddleFork04.ssn') mf04p <- ssn_import( path = temp_path, predpts = c("pred1km", "CapeHorn", "Knapp"), overwrite = TRUE )
obs_df <- ssn_get_data(mf04p, "obs") |> dplyr::mutate(DRAINAGEKM2 = CDRAINAG) |> # adding 11-character field/column dplyr::select(everything(),netgeom, DRAINAGEKM2, geometry) # organize for later comparison with written/imported ssn object
mf04p_mod <- ssn_put_data(obs_df, mf04p,"obs")
ssn_write(mf04p_mod, path = paste0(getwd(),"/mf04p_out.ssn"), overwrite = TRUE)
mf04p_in <- ssn_import(path = paste0(getwd(),"/mf04p_out.ssn"), predpts = c("pred1km", "CapeHorn", "Knapp"))
data.frame(mf04p_mod = names(mf04p_mod$obs), mf04p_in = names(mf04p_in$obs)) |> dplyr::mutate(nchar_mod = nchar(mf04p_mod), nchar_in = nchar(mf04p_in)) `
Hello! I've noticed that writing an ssn object using
ssn_write()
and then importing it usingssn_import()
results in modified variable names. Below is a minimal reproducible example using the SSN2 package's Middle Fork ssn object.`library(SSN2)
copy_lsn_to_temp()
temp_path <- paste0(tempdir(),'/MiddleFork04.ssn') mf04p <- ssn_import( path = temp_path, predpts = c("pred1km", "CapeHorn", "Knapp"), overwrite = TRUE )
ssn_write(mf04p, path = paste0(getwd(),"/mf04p_out.ssn"), overwrite = TRUE)
mf04p_in <- ssn_import(path = paste0(getwd(),"/mf04p_out.ssn"), predpts = c("pred1km", "CapeHorn", "Knapp"))
summary(mf04p) # see variable names from original ssn object summary(mf04p_in) # see modified variable names from original ssn object `
The summaries of each ssn object show how the variable names have change between the original
mf04p
ssn object to the written/importedmf04p_in
ssn object. It doesn't appear to be just a simple character length issue for shapefile .dbf fields when writing to a new .ssn. Additionally, writing/importing also appears to add a new 'ntgmtry' field to each obs/preds entry in the ssn object that is a duplicate of the 'netgeometry' field.