cont-limno / LAGOSNE

Interface to the LAke multi-scaled GeOSpatial & temporal database :earth_americas:
https://cont-limno.github.io/LAGOSNE/
15 stars 8 forks source link

Allow user to select groups of columns #13

Closed limnoliver closed 7 years ago

limnoliver commented 8 years ago

In addition to the ability select columns by name, allow the user to select pre-defined groups of columns within each table. For example, "atmospheric deposition" that may include multiple variables.

jsta commented 8 years ago

I am having trouble coming up with an easy way to do this. From what I can tell, all the deposition columns have "dep" in their name so that particular example wouldn't be too difficult. Are there categories without a "key" in the name? That might be pretty difficult. The brute force way would be to create a lookup-table to check against in lagos_select:

name category
hu4_baseflowindex_std baseflow
hu4_dep_no3_1985_min deposition
hu4_dep_no3_1985_max deposition
... ...
limnoliver commented 8 years ago

@jsta I've been working on a way to implement this. Ideally, it works within lagos_select, but I started by creating a separate function lagos_select_group. Basically the way I've been implementing it is through a grep function. The groups I came up with either have a single word (atm dep = "dep") or a few keywords associated with the group (hydrology = baseflow|groundwater|runoff|saturation). Would it be helpful for me to define the groups based on keywords - and you might have a better idea of how to implement this?

limnoliver commented 8 years ago

I added my function - though really it's just the defined groups right now. Not sure how you want to implement this, but the groups are there (LAGOS/R/select_group.R).

jsta commented 8 years ago

This is great information. What do you think about integrating these rules into an internal function that checks the table_column_nested argument to the lagos_select function and expands the variables as necessary rather than creating a new user-facing function (lagos_select_group)?

limnoliver commented 8 years ago

Yeah, I like the idea of having it all within the lagos_select function. We can move the documentation of the possible table/group combinations into the table_column_nested argument description or details.

jsta commented 8 years ago

Ok, I think I've come-up with a good implementation of this idea (8f7c14b). Have a look at the second example for lagos_select. So far, I have only loaded the keyword selection rules for deposition and waterquality. It still needs to be tested with multiple tables and mix of keywords and non-keywords.

dt <- lagos_load("1.054.1")
# group-select using keywords
table_columns <- list("epi.nutr" = c("waterquality", "lagoslakeid"),
                              "hu4.chag" = c("deposition"))
dt_reduced <- LAGOS::lagos_select(dt, table_columns)
jsta commented 7 years ago

I am running into difficulty because a query that uses the waterquality keyword will set-off a grep for secchi but there are secchi columns in both the epi.nutr table and the secchi table.

jsta commented 7 years ago

This works now. See lagos_select examples and a3820123737980d829d47b32f83b6c118efc8cb1