USGS-R / wateRuse_swuds

Moved to: https://code.usgs.gov/water/water-use/wateRuse_swuds
https://code.usgs.gov/water/water-use/wateRuse_swuds
0 stars 5 forks source link

filter.R: Generic filter on multiple attributes #27

Open grrmartin-USGS opened 5 years ago

grrmartin-USGS commented 5 years ago

Should be an elegant way to implement such a filter. I tried several methods including use of 'which' and dplyr 'filter' (https://dplyr.tidyverse.org/reference/filter.html). Ideally it will work using a loop with a counter keyed to the number and sequence of attributes user wishes to filter on.

I have set up two values supplying the attributes and conditions: attributes <-c("FROM_NAT_WATER_USE_CD","FROM_COUNTY_NM", "YEAR") conditions <- c("WS","Lawrence County", "1990").

filter command works fine as set up now within if-clauses:

if (length(attributes)==3){s.wuds %>% filter( .data[[attributes[[1]]]] == conditions[[1]], .data[[attributes[[2]]]] == conditions[[2]], .data[[attributes[[3]]]] == conditions[[3]] ) }

However, when entire function is called, it gives error: Error in UseMethod("filter") : no applicable method for 'filter' applied to an object of class "character". dplyr syntax and use of dots is beyond my experience.

This is the form: image

I tried passing this thru a loop, which seemed to work initially but stopped working: for (i in 1:length(attributes)) into filter( .data[[attributes[[i]]]] == conditions[[i]],

The .data format seems closed to any loop index.

ldecicco-USGS commented 5 years ago

I mentioned this in the pull request, but I strongly think that the dplyr filter is quite elegant enough and that it would be more complicated to maintain this function over time. The other problem with this solution that I didn't mention in the pull request is that if you've got different types (numbers/characters/factors/etc), it will be harder...also using dplyr's filter allows more flexibility with conditions.

For example, I find this less confusing:

s_wuds_filtered <- s.wuds %>%
               filter(FROM_NAT_WATER_USE_CD == "WS",
                      FROM_COUNTY_NM == "Lawrence County",
                      YEAR == "1990")

Than what this swuds_filter would look like:

s_wuds_filtered <- s.wuds %>%
                   filtered(c("FROM_NAT_WATER_USE_CD", "FROM_COUNTY_NM", "YEAR"),
                            c("WS", "Lawrence County", "1990"))
grrmartin-USGS commented 5 years ago

Looking at various filtering scenarios, it appears to me that it does not matter the sequence/order attributes are selected. (The sequence will matter with aggregating in combination with filtering.) The '&' filtering is just intersecting subsets of the data. Also, any number and combination of attributes can be filtered--but there may not necessarily be a 'match' for a given combination of attributes--for example, filtering on FROM_COUNTY_NM=="x" and FROM_SITE_TP_CD=="FA-DV" would be null if no intakes were in that county.

So a generic filter function might be needed for users to pick values (conditions) for up to any 5? attributes of the 152 columns. Example: FROM_NAT_WATER_USE_CD=="WS", FROM_SITE_TP_CD=="FA-DV", FROM_COUNTY_NM=="Hancock County", YEAR=="2005", Month=="Feb".

Also, in Shiny, some 'guided' filtering pick lists may be useful: Sector: FROM_NAT_WATER_USE_CD (pick from any present in data); Source: FROM_SITE_TP_CD (pick from any present in data); Period: range of years; Location: FROM_STATE_NM, FROM_COUNTY_NM, FROM_HUC_CD, FROM_COUNTYAQUIFER_NAME; Owner: From_Owner_Name; From-site permit: several (identifiable from nine 4-character codes); To-site permit: several (identifiable from nine 4-character codes)

Scenarios: image

UI_Content_Layout_Retrieval.xlsx

RE: Is this mode of operation---filtering and aggregating in various sequences, and revising the sequence later in the session---something we can do with R/shiny?

@ldecicco-USGS

Member ldecicco-USGS commented 4 days ago Yes, absolutely.

@ldecicco-USGS

Member ldecicco-USGS commented 4 days ago If you make specific examples of what you want to see from the OH data set, I can write you the code you need.

grrmartin-USGS commented 5 years ago

The sequence of guided filter pick-lists in Shiny might split near the beginning with the user selecting to proceed to select from either from-site attributes, or to-site attributes.

grrmartin-USGS commented 5 years ago

Actually a user might wish to filter on both from-site attributes and to-site attributes: for example hypothetically, both FROM_STATE_NM and To_State_Name. But the menu/pick-lists could be arranged as from-site attributes and to-site attributes.

grrmartin-USGS commented 5 years ago

Example for sequence/combination of filtering and aggregating to review outliers:

Filter: FROM_NAT_WATER_USE_CD=="WS", FROM_SITE_TP_CD=="FA-DV", To_Site_Type_Code=="FA-WDM"; Aggregate monthly Volume_mgd: By permit TO_EPWS_USEPA; Filter: Top 10-percent of maximum month/minimum month values. Then graph monthly values, overlaying 10 largest values, next 10 largest values, etc.