USEPA / EPATADA

This R package can be used to compile and evaluate Water Quality Portal (WQP) data for samples collected from surface water monitoring sites on streams and lakes. It can be used to create applications that support water quality programs and help states, tribes, and other stakeholders efficiently analyze the data.
https://usepa.github.io/EPATADA/
Creative Commons Zero v1.0 Universal
39 stars 18 forks source link

TADA_CalculateTotalNP should use TADA.MonitoringLocationIdentifier if it exists in data frame #475

Open cristinamullin opened 2 months ago

cristinamullin commented 2 months ago

Is your feature request related to a problem? Please describe.

Currently, the TADA_FindNearbySites identifies sites within some buffer distance of each other and assigns them a TADA.MonitoringLocationIdentifier which concatenates the two original MonitoringLocationIdentifier's, but it does not automatically harmonize them to one site's metadata (name, latitude, longitude, organization, and many more... ??). This is difficult when subsequent functions (like total NP summation) depend upon monitoring location identifier as a grouping variable to define total N or total P summations.

Which site metadata (name, latitude, longitude and...??) should TADA_CalculateTotalNP assign these totals to if they should come from multiple nearby sites?

Describe the solution you'd like

The TADA_CalculateTotalNP in Transformations.R already mentions the TADA_FindNearbySites function (lines 233 and 319) but it is not directly used within the function yet.

To start/for now, TADA_CalculateTotalNP can be updated as follows:

Describe alternatives you've considered

This option is more complicated and if needed, could be created as a separate issue to address later after further discussions with the TADA Working Group about requirements.

Option A: The function could have an organization hierarchy like the duplicate functions and could preferentially pick the best-ranked organization to provide the key metadata (both for organization and site metadata elements) needed to merge sites to one. Result information should remain distinct.

Option B: The function could concatenate site and organization metadata to carry both through to the new row.

Reminders for TADA contributors addressing this issue

New features should include all of the following work:

cristinamullin commented 2 months ago

@hillarymarler this is related to the discussion we had about creating TADA.MonitoringLocationIdentifier as part of autoclean at the start. Once that is implemented, then all TADA functions including this one can reference that instead of the original MonitoringLocationIdentifier.

hillarymarler commented 1 month ago

I will work on this as part of the pull request that also creates TADA.MonitoringLocationIdentifier