USEPA / SSN2

SSN2: Spatial Modeling on Stream Networks in R
https://usepa.github.io/SSN2/
GNU General Public License v3.0
13 stars 3 forks source link

Writing and importing an ssn object alters variable names #2

Closed matthewrfuller closed 11 months ago

matthewrfuller commented 11 months ago

Hello! I've noticed that writing an ssn object using ssn_write() and then importing it using ssn_import() results in modified variable names. Below is a minimal reproducible example using the SSN2 package's Middle Fork ssn object.

`library(SSN2)

copy_lsn_to_temp()

temp_path <- paste0(tempdir(),'/MiddleFork04.ssn') mf04p <- ssn_import( path = temp_path, predpts = c("pred1km", "CapeHorn", "Knapp"), overwrite = TRUE )

ssn_write(mf04p, path = paste0(getwd(),"/mf04p_out.ssn"), overwrite = TRUE)

mf04p_in <- ssn_import(path = paste0(getwd(),"/mf04p_out.ssn"), predpts = c("pred1km", "CapeHorn", "Knapp"))

summary(mf04p) # see variable names from original ssn object summary(mf04p_in) # see modified variable names from original ssn object `

The summaries of each ssn object show how the variable names have change between the original mf04p ssn object to the written/imported mf04p_in ssn object. It doesn't appear to be just a simple character length issue for shapefile .dbf fields when writing to a new .ssn. Additionally, writing/importing also appears to add a new 'ntgmtry' field to each obs/preds entry in the ssn object that is a duplicate of the 'netgeometry' field.

pet221 commented 11 months ago

@matthewrfuller Thanks for finding and reporting this issue so quickly. It looks like st_write() is abbreviating the column names because the "netgeometry" column name has more than 10 characters, which is the maximum allowed in a dbf file. Interesting that it also truncates column names that are within the character limit... I've updated ssn_write(), ssn_subset() and ssn_split_predpts() to remove netgeometry before writing to shapefile, which addresses the issue within SSN2. @michaeldumelle - we need to decide whether this fix is sufficient or whether we should shorten the netgeometry column name to conform with those standards: netgeom? n_geometry?

michaeldumelle commented 11 months ago

Thanks @matthewrfuller. We changed the name of netgeometry to netgeom to avoid exceeding the 10 character limit for column/field names while writing to shapefiles. This fix is available in the development version now (remotes::install_github("USEPA/SSN2", ref = "develop")) and will included be in the next CRAN release.

matthewrfuller commented 11 months ago

Excellent! Thanks for implementing this change in SSN2 so quickly!

I'm wondering if a note in the ssn_write() function documentation should warn/advise users to maintain column/field names at 10 characters or less in both observation and prediction data frames if they'd like to maintain field/column names through the entire write/import process. Otherwise, if just one field exceeds the 10-character limit when writing the SSN object, the ESRI Shapefile driver used by st_write() will abbreviate all column/field names with 8 or more characters. Here's a reprex that demonstrates this behavior after adding a column with an 11-character name (DRAINAGEKM2) to the Introduction vignette's mf04p ssn object observations.

`remotes::install_github("USEPA/SSN2", ref = "develop") library(SSN2)

copy_lsn_to_temp()

temp_path <- paste0(tempdir(),'/MiddleFork04.ssn') mf04p <- ssn_import( path = temp_path, predpts = c("pred1km", "CapeHorn", "Knapp"), overwrite = TRUE )

obs_df <- ssn_get_data(mf04p, "obs") |> dplyr::mutate(DRAINAGEKM2 = CDRAINAG) |> # adding 11-character field/column dplyr::select(everything(),netgeom, DRAINAGEKM2, geometry) # organize for later comparison with written/imported ssn object

mf04p_mod <- ssn_put_data(obs_df, mf04p,"obs")

ssn_write(mf04p_mod, path = paste0(getwd(),"/mf04p_out.ssn"), overwrite = TRUE)

mf04p_in <- ssn_import(path = paste0(getwd(),"/mf04p_out.ssn"), predpts = c("pred1km", "CapeHorn", "Knapp"))

data.frame(mf04p_mod = names(mf04p_mod$obs), mf04p_in = names(mf04p_in$obs)) |> dplyr::mutate(nchar_mod = nchar(mf04p_mod), nchar_in = nchar(mf04p_in)) `