USEPA / EPATADA

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.
https://usepa.github.io/EPATADA/
Creative Commons Zero v1.0 Universal
38 stars 17 forks source link

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

Open hillarymarler opened 1 week ago

hillarymarler commented 1 week ago

Describe the bug

TADA.ComparableDataIdentifier is used for the legend labels in TADA_TwoCharacteristicScatterplot even when a different column has been selected for id_cols. This is confusing when comparing the same characteristic between two monitoring locations, for example:

image

To Reproduce

df <- dplyr::filter(Data_6Tribes_5y_Harmonized, TADA.ComparableDataIdentifier == "TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L")
#' # Creates a scatterplot including the two specified sites in the same plot:
TADA_TwoCharacteristicScatterplot(df, id_cols = "MonitoringLocationName", groups = c("Upper Red Lake: West", "Upper Red Lake: West-Central"))

Expected behavior

Labels for the legend should reflect the column selected in id_cols.

Bug fixes should include all the following work:

hillarymarler commented 4 days ago

Should we limit the columns that can be used for id_cols?

wokenny13 commented 4 days ago

A limitation on column arguments for id_cols could be good to consider.

Just a thought process for this function: If a different column, like the monitoring location name, is being compared, and not two characteristics (like for this example when just TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L is being compared for two monitoring location, would the function TADA_TwoCharacteristicScatterplot be a bit 'misleading' as it's only a single characteristic? Since both y axis are based on the same characteristic, wouldn't a single y column be sufficient if there's only one characteristic, or should there be considerations on keeping the same scale when it is the same characteristic (ex. 0 to 100 MG/L for both y-axis 1 and y-axis 2?). newplot

Above is the view on the legend labels when id_cols are for monitoring location names to try to address this issue.

hillarymarler commented 4 days ago

Below is an example of a scatterplot I modified for a demo. See (https://usepa.github.io/EPATADA/articles/TADAWaterSciConWorkshopDemo.html) if you want to see how to create the data set used in the example below.

Maybe it would be possible to conditionally remove the 2nd y-axis if the same characteristic is plotted in both traces? But base the scale on both traces?

What do you think of changing the name to TADA_TwoGroupScatterplot? That might be more descriptive if we are making it flexible enough to accommodate different id_cols inputs.

Or is this starting to get so convoluted that it may make sense to create a separate function for comparing different locations rather than different characteristics? We could discuss tomorrow.

`# create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier == "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols = "ATTAINS.assessmentunitname", groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE), title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))

create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot

twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier == "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols = "ATTAINS.assessmentunitname", groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE), title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))`

image