cont-limno / LAGOSNE

Interface to the LAke multi-scaled GeOSpatial & temporal database :earth_americas:
https://cont-limno.github.io/LAGOSNE/
15 stars 8 forks source link

HUC Zoneids are loaded as factors not character strings #18

Closed jsta closed 6 years ago

jsta commented 7 years ago

This was done to be consistent with the legacy loading scripts but can be a real pain for analysis. Is there any reason not to load as character strings instead?

limnoliver commented 7 years ago

I'm usually using huc zoneids as factors when I'm using the dataset. For example, quickly finding out how many observations there are per huc unit is easy when it's a factor. What types of analyses are you thinking of where a factor is a pain?

jsta commented 7 years ago

After:

library(LAGOS)
dt <- lagos_load(version = "1.054.2", format = "rds")

Try comparing the output of:

conn <- dt$hu12.conn[order(dt$hu12.conn$hu12_zoneid),][1:5, 1:3]
ifelse(conn$hu12_canalditchdensity_sum_lengthm > 0, "greater than 0", conn$hu12_zoneid)

[1] "1" "2" "3" "4" "greater than 0"

with:

conn <- dt$hu12.conn[order(dt$hu12.conn$hu12_zoneid),][1:5, 1:3]
conn$hu12_zoneid <- as.character(conn$hu12_zoneid)
ifelse(conn$hu12_canalditchdensity_sum_lengthm > 0, "greater than 0", conn$hu12_zoneid)

[1] "HU12_1" "HU12_10" "HU12_100" "HU12_1000" "greater than 0"

jsta commented 7 years ago

I think that a typical workflow will start with data joins and rearrangements (which can be broken if the join field is a factor) and end with stats and modelling where factors are preferable. I would be worried about throwing this workflow off due to silent failure on preceding data join steps. Once people get to stats and modelling they can convert to factor?

jsta commented 7 years ago

You can use the table() function to count observations per unique character.

jsta commented 6 years ago

HUC IDs were converted to character strings by b91a475 in the process of padding IDs that were missing leading zeros.

jsta commented 6 years ago

The previous closing message had not been applied outside of the base huc tables (i.e. not lakes.geo or hu4.lulc).