hillarymarler commented 1 week ago

Describe the bug

TADA.ComparableDataIdentifier is used for the legend labels in TADA_TwoCharacteristicScatterplot even when a different column has been selected for id_cols. This is confusing when comparing the same characteristic between two monitoring locations, for example:

To Reproduce

df <- dplyr::filter(Data_6Tribes_5y_Harmonized, TADA.ComparableDataIdentifier == "TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L")
#' # Creates a scatterplot including the two specified sites in the same plot:
TADA_TwoCharacteristicScatterplot(df, id_cols = "MonitoringLocationName", groups = c("Upper Red Lake: West", "Upper Red Lake: West-Central"))

Expected behavior

Labels for the legend should reflect the column selected in id_cols.

Bug fixes should include all the following work:

[ ] Create or edit the function/code.
[ ] Document all code using line/inline and/or multi-line/block comments to describe what is does.
[ ] Create or edit tests in tests/testthat folder to help prevent and/or troubleshoot potential future issues.
[ ] Create or edit the function documentation. Include working examples.
[ ] Update or add the new functionality to the appropriate vignette (or create new one).
[ ] If function/code edits made as part of this issue impact other functions in the package or functionality in the shiny app, ensure those are updated as well.
[ ] Run TADA_UpdateAllRefs(), TADA_UpdateExampleData(), styler::style_pkg(), devtools::document(), and devtools::check() and address any new notes or issues before creating a pull request.
[ ] Run more robust check for releases: devtools::check(manual = TRUE, remote = TRUE, incoming = TRUE)

hillarymarler commented 4 days ago

Should we limit the columns that can be used for id_cols?

wokenny13 commented 4 days ago

A limitation on column arguments for id_cols could be good to consider.

Just a thought process for this function: If a different column, like the monitoring location name, is being compared, and not two characteristics (like for this example when just TOTAL PHOSPHORUS, MIXED FORMS_UNFILTERED_AS P_UG/L is being compared for two monitoring location, would the function TADA_TwoCharacteristicScatterplot be a bit 'misleading' as it's only a single characteristic? Since both y axis are based on the same characteristic, wouldn't a single y column be sufficient if there's only one characteristic, or should there be considerations on keeping the same scale when it is the same characteristic (ex. 0 to 100 MG/L for both y-axis 1 and y-axis 2?). newplot

Above is the view on the legend labels when id_cols are for monitoring location names to try to address this issue.

hillarymarler commented 4 days ago

Below is an example of a scatterplot I modified for a demo. See (https://usepa.github.io/EPATADA/articles/TADAWaterSciConWorkshopDemo.html) if you want to see how to create the data set used in the example below.

Maybe it would be possible to conditionally remove the 2nd y-axis if the same characteristic is plotted in both traces? But base the scale on both traces?

What do you think of changing the name to TADA_TwoGroupScatterplot? That might be more descriptive if we are making it flexible enough to accommodate different id_cols inputs.

Or is this starting to get so convoluted that it may make sense to create a separate function for comparing different locations rather than different characteristics? We could discuss tomorrow.

`# create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier == "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols = "ATTAINS.assessmentunitname", groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE), title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))

create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot

twochar_scatter <- TADA_TwoCharacteristicScatterplot(data %>% dplyr::filter(ActivityStartDate > "2014-12-31", TADA.ComparableDataIdentifier == "SPECIFIC CONDUCTANCE_TOTAL_NA_US/CM @25C"), id_cols = "ATTAINS.assessmentunitname", groups = c("San Juan River (Navajo bnd at Hogback to Animas River)", "Animas River (San Juan River to Estes Arroyo)")) %>%

remove default plot features that are not applicable for a location comparison

plotly::layout(yaxis2 = list(overlaying = "y", side = "right", title = "", visible = FALSE), title = TADA_InsertBreaks("SPECIFIC CONDUCTANCE for the San Juan and Animas Rivers Over Time"))`

USEPA / EPATADA

TADA.ComparableDataIdentifier is used for legend labels in TADA_TwoCharacteristicScatterplot uses a different column for id_cols #485

remove default plot features that are not applicable for a location comparison

create two characteristic scatterplot using TADA_TWoCharacteristicScatterplot

remove default plot features that are not applicable for a location comparison